data, code and conversation - from Andy Boothe

2023-09-132023-09-14

Announcing HumanGraphics!

HumanGraphics is a REST API that converts names, locations, and faces into structured data and demographics estimates. Sound interesting? Take it for a test drive!

And it launched today! Check out the website for a special launch offer!

2023-08-102023-08-10

Postgres Upsert: Created or Updated?

The excellent PostgreSQL database supports the following syntax for the common upsert SQL pattern:

INSERT INTO table(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id) DO UPDATE
SET
    column1=EXCLUDED.value1,
    column2=EXCLUDED.value2
RETURNING *

With this syntax, you can insert a new row with the given values, or update the existing row to the given values, using the row’s primary key (or other indexed key). Very handy!

But did the resulting upserted row insert or update?

2023-07-302023-07-30

ISO 15924 Codes to Unicode Scripts… and Back Again!

The writing system that people use can actually tell you a lot about the person: where they are, what language(s) they speak, and so on. The two main standards for writing systems are ISO 15924 and Unicode. They are closely related to each other, but not every ISO 15924 entry has a corresponding Unicode script, e.g., the Afaka writing system (Afak).

I just released a new dataset to help people map from one to the other and identify writing systems that are mostly historic in interest, and therefore not interesting in a modern context.

2023-07-152023-07-15

Popular Names by Country Dataset

I needed a dataset of popular names by country for testing, but I couldn’t find one that had everything I needed. So I made my own!

Need a free dataset of popular names by country, including CJK and RTL examples, plus romanization and counts, all for a boatload of countries? Me too! Keep reading to hear more about what I put together.

2023-06-292024-01-05

Regex for 50 US States

Need a regular expression to recognize the 50 US state names or USPS abbrevations? Here it is!

2023-06-282023-07-15

emoji4j v15.0.1 Released

A new version v15.0.1 of my emoji processing library, emoji4j, for Java 8+ just dropped. Here are the updates:

Update to Unicode 15
New method GraphemeMatcher#results()
Imroved documentation
Even more tests

There is also now a Cookbook in the emoji4j wiki to help users solve hard or common problems with emoji4j.

Enjoy!

2023-05-032023-06-22

Techniques for Solving the Toughest Problems

I’m a software architect, software engineer, data scientist, and analyst by trade. So far in my career, I have faced more and different problems than the average bear. This is a list of some of the techniques I have used to crack some of the harder nuts in my career. They’re pretty simple, so don’t expect any major revelations, and they’re obviously focused on my own experience as a technical knowledge worker. I’ll probably add some new ones over time, too. But I hope they’ll be useful to others, no matter their background.

Rubber Duckie, You’re the One

I certainly did not invent rubber duck debugging, the practice of explaining your code to a theoretical (or real) rubber duckie, but I’m a big believer in it.

Explaining your thinking and logic to another person forces you to structure and justify your thinking, which often helps tease apart related concepts and clarify reasoning (or lack thereof). And it turns out that it doesn’t even matter if the other person talks back!

2023-04-272024-10-09

AWS Step Functions Distributed Map ResultWriter Example

I’m using AWS Step Functions to do some complex orchestration of services that could span more than 25,000 state transitions and exchange data sets larger than 256KB, so I’m making heavy use of the new distributed map feature. It definitely makes things easier than the old everything-is-a-child-execution approach! However, the ResultWriter field is not particularly well-documented, so I’m hoping to shed some light on it here with a simple example.

2023-04-012023-04-03

Community-Managed OpenAPI Spec for Pinecone API

The excellent vector database Pinecone has a very useful API, but client support is sparse. While the API and its clients are in theory based off of an OpenAPI spec, no one seems to be able to find it.

But I needed an OpenAPI spec for the API, so I reverse engineered one (read: copy and pasted from the documentation).

It is very new and so should be considered very experimental. If you use it and find a bug, please open an issue, or even better submit a pull request!