In my opinion, JDBI is simply the best database access framework for relational databases available for Java today. My new library, JDBQ, is essentially a port of JDBI from relational databases to Google BigQuery.

from Andy Boothe
In my opinion, JDBI is simply the best database access framework for relational databases available for Java today. My new library, JDBQ, is essentially a port of JDBI from relational databases to Google BigQuery.
If anyone needs a regular expression for matching single Emoji in BigQuery/RE2, this is a good place to start:(?:[\x{1F300}-\x{1F5FF}]|[\x{1F900}-\x{1F9FF}]|[\x{1F600}-\x{1F64F}]|[\x{1F680}-\x{1F6FF}]|[\x{2600}-\x{26FF}]\x{FE0F}?|[\x{2700}-\x{27BF}]\x{FE0F}?|\x{24C2}\x{FE0F}?|[\x{1F1E6}-\x{1F1FF}]{1,2}|[\x{1F170}\x{1F171}\x{1F17E}\x{1F17F}\x{1F18E}\x{1F191}-\x{1F19A}]\x{FE0F}?|[\\x{0023}\x{002A}\x{0030}-\x{0039}]\x{FE0F}?\x{20E3}|[\x{2194}-\x{2199}\x{21A9}-\x{21AA}]\x{FE0F}?|[\x{2B05}-\x{2B07}\x{2B1B}\x{2B1C}\x{2B50}\x{2B55}]\x{FE0F}?|[\x{2934}\x{2935}]\x{FE0F}?|[\x{3297}\x{3299}]\x{FE0F}?|[\x{1F201}\x{1F202}\x{1F21A}\x{1F22F}\x{1F232}\x{1F23A}\x{1F250}\x{1F251}]\x{FE0F}?|[\x{203C}-\x{2049}]\x{FE0F}?|[\x{00A9}-\x{00AE}]\x{FE0F}?|[\x{2122}\x{2139}]\x{FE0F}?|\x{1F004}\x{FE0F}?|\x{1F0CF}\x{FE0F}?|[\x{231A}\x{231B}\x{2328}\x{23CF}\x{23E9}\x{23F3}\x{23F8}\x{23FA}]\x{FE0F}?)
Inspired by zly394/EmojiRegex on GitHub.
I just released the first beta version of litecene, a Java library that implements a common boolean search syntax for full-text search, with its first transpiler for BigQuery.
Litecene uses a simple, user-friendly syntax that is derived from Lucene query syntax that should be familiar to most users. It includes clauses for term with wildcard, phrase with proximity, AND
, OR
, NOT
, and grouping.
As an example, this might be a good Litecene query to identify social media posts talking about common ways people user their smartphones:
(smartphone OR "smart phone" OR iphone OR "apple phone" OR android OR "google phone" OR "windows phone" OR "phone app"~8) AND (call* OR dial* OR app OR surf* OR brows* OR camera* OR pic* OR selfie)
The syntax is documented in more detail here.
Continue reading “Litecene: Full-Text Search for Google BigQuery”I was investigating the feasibility of putting natively-formatted Twitter data into BigQuery, and got pretty far along the way before deciding to go another direction. I found the schema in the twitter-for-bigquery project to be incomplete for my needs, so I made a new schema of my own. I’m making the schema available here in case it’s of use to anyone else.