Social Media - data, code and conversation

Posted on 2026-04-272026-04-28

Subreddit History Is a Surprisingly Good Lie Detector

Depending on your perspective, Reddit is either one of the last great online communities or a petri dish for the internet’s most cutting-edge bots. Or, maybe, both.

One thing that’s become increasingly hard to ignore is how much conversation on Reddit is shaped by accounts that don’t really read like people. The scale of the problem is obvious, and everyone sees it: bot networks, coordinated astroturfing campaigns, accounts for sale, and now AI slop… But the tools Reddit gives users to assess account credibility are thin at best. Reddit lets you click a username and see when an account was created and a karma score. That’s about it. And that’s not very useful when karma is literally for sale.

I built Reddit Contextualizer (Chrome, Firefox) to help users identify things more easily.

Continue reading “Subreddit History Is a Surprisingly Good Lie Detector”

Posted on 2023-06-132023-07-15

Parsing Rich Social Media Text

If you ever need to convert “plain” social media text like:

Hey @importantguy, check out my project https://www.mycoolproject.com/ #PrettyPlease

into “rich” social media text like:

Hey @importantguy, check out my project https://www.mycoolproject.com/ #PrettyPlease

Then you need a social text parsing library. The industry standard is twitter/twitter-text, but it doesn’t work for everything. (For example, it only parses valid Twitter mentions, but will not parse all valid TikTok mentions, since those screen names can contain a “.”.) So while you may need to customize for your specific use case(s), this post should at least give you a good starting point.

Continue reading “Parsing Rich Social Media Text”

Posted on 2020-12-262022-05-17

BigQuery Twitter Schema

I was investigating the feasibility of putting natively-formatted Twitter data into BigQuery, and got pretty far along the way before deciding to go another direction. I found the schema in the twitter-for-bigquery project to be incomplete for my needs, so I made a new schema of my own. I’m making the schema available here in case it’s of use to anyone else.

Continue reading “BigQuery Twitter Schema”