In my opinion, JDBI is simply the best database access framework for relational databases available for Java today. My new library, JDBQ, is essentially a port of JDBI from relational databases to Google BigQuery.
Continue reading “JDBQ: A Database Access Framework for Java + BigQuery”BigQuery RegExp for Matching Single Emoji
If anyone needs a regular expression for matching single Emoji in BigQuery/RE2, this is a good place to start:(?:[\x{1F300}-\x{1F5FF}]|[\x{1F900}-\x{1F9FF}]|[\x{1F600}-\x{1F64F}]|[\x{1F680}-\x{1F6FF}]|[\x{2600}-\x{26FF}]\x{FE0F}?|[\x{2700}-\x{27BF}]\x{FE0F}?|\x{24C2}\x{FE0F}?|[\x{1F1E6}-\x{1F1FF}]{1,2}|[\x{1F170}\x{1F171}\x{1F17E}\x{1F17F}\x{1F18E}\x{1F191}-\x{1F19A}]\x{FE0F}?|[\\x{0023}\x{002A}\x{0030}-\x{0039}]\x{FE0F}?\x{20E3}|[\x{2194}-\x{2199}\x{21A9}-\x{21AA}]\x{FE0F}?|[\x{2B05}-\x{2B07}\x{2B1B}\x{2B1C}\x{2B50}\x{2B55}]\x{FE0F}?|[\x{2934}\x{2935}]\x{FE0F}?|[\x{3297}\x{3299}]\x{FE0F}?|[\x{1F201}\x{1F202}\x{1F21A}\x{1F22F}\x{1F232}\x{1F23A}\x{1F250}\x{1F251}]\x{FE0F}?|[\x{203C}-\x{2049}]\x{FE0F}?|[\x{00A9}-\x{00AE}]\x{FE0F}?|[\x{2122}\x{2139}]\x{FE0F}?|\x{1F004}\x{FE0F}?|\x{1F0CF}\x{FE0F}?|[\x{231A}\x{231B}\x{2328}\x{23CF}\x{23E9}\x{23F3}\x{23F8}\x{23FA}]\x{FE0F}?)
Inspired by zly394/EmojiRegex on GitHub.
Litecene: Full-Text Search for Google BigQuery
I just released the first beta version of litecene, a Java library that implements a common boolean search syntax for full-text search, with its first transpiler for BigQuery.
Litecene Syntax
Litecene uses a simple, user-friendly syntax that is derived from Lucene query syntax that should be familiar to most users. It includes clauses for term with wildcard, phrase with proximity, AND
, OR
, NOT
, and grouping.
As an example, this might be a good Litecene query to identify social media posts talking about common ways people user their smartphones:
(smartphone OR "smart phone" OR iphone OR "apple phone" OR android OR "google phone" OR "windows phone" OR "phone app"~8) AND (call* OR dial* OR app OR surf* OR brows* OR camera* OR pic* OR selfie)
The syntax is documented in more detail here.
Continue reading “Litecene: Full-Text Search for Google BigQuery”BigQuery Twitter Schema
I was investigating the feasibility of putting natively-formatted Twitter data into BigQuery, and got pretty far along the way before deciding to go another direction. I found the schema in the twitter-for-bigquery project to be incomplete for my needs, so I made a new schema of my own. I’m making the schema available here in case it’s of use to anyone else.
Continue reading “BigQuery Twitter Schema”