ISO 15924 Codes to Unicode Scripts… and Back Again!

The writing system that people use can actually tell you a lot about the person: where they are, what language(s) they speak, and so on. The two main standards for writing systems are ISO 15924 and Unicode. They are closely related to each other, but not every ISO 15924 entry has a corresponding Unicode script, e.g., the Afaka writing system (Afak).

There and Back Again

I just released a new dataset to help people map from one to the other and identify writing systems that are mostly historic in interest, and therefore not interesting in a modern context.

Continue reading “ISO 15924 Codes to Unicode Scripts… and Back Again!”

emoji4j v15.0.1 Released

A new version v15.0.1 of my emoji processing library, emoji4j, for Java 8+ just dropped. Here are the updates:

  • Update to Unicode 15
  • New method GraphemeMatcher#results()
  • Imroved documentation
  • Even more tests

There is also now a Cookbook in the emoji4j wiki to help users solve hard or common problems with emoji4j.

Enjoy!

Parsing Rich Social Media Text

If you ever need to convert “plain” social media text like:

Hey @importantguy, check out my project https://www.mycoolproject.com/ #PrettyPlease

into “rich” social media text like:

Hey @importantguy, check out my project https://www.mycoolproject.com/ #PrettyPlease

Then you need a social text parsing library. The industry standard is twitter/twitter-text, but it doesn’t work for everything. (For example, it only parses valid Twitter mentions, but will not parse all valid TikTok mentions, since those screen names can contain a “.”.) So while you may need to customize for your specific use case(s), this post should at least give you a good starting point.

Continue reading “Parsing Rich Social Media Text”

Techniques for Solving the Toughest Problems

I’m a software architect, software engineer, data scientist, and analyst by trade. So far in my career, I have faced more and different problems than the average bear. This is a list of some of the techniques I have used to crack some of the harder nuts in my career. They’re pretty simple, so don’t expect any major revelations, and they’re obviously focused on my own experience as a technical knowledge worker. I’ll probably add some new ones over time, too. But I hope they’ll be useful to others, no matter their background.

Rubber Duckie, You’re the One

I certainly did not invent rubber duck debugging, the practice of explaining your code to a theoretical (or real) rubber duckie, but I’m a big believer in it.

You make coding lots of fun!

Explaining your thinking and logic to another person forces you to structure and justify your thinking, which often helps tease apart related concepts and clarify reasoning (or lack thereof). And it turns out that it doesn’t even matter if the other person talks back!

Continue reading “Techniques for Solving the Toughest Problems”

AWS Step Functions Distributed Map ResultWriter Example

I’m using AWS Step Functions to do some complex orchestration of services that could span more than 25,000 state transitions and exchange data sets larger than 256KB, so I’m making heavy use of the new distributed map feature. It definitely makes things easier than the old everything-is-a-child-execution approach! However, the ResultWriter field is not particularly well-documented, so I’m hoping to shed some light on it here with a simple example.

Simple Example Step Function
Continue reading “AWS Step Functions Distributed Map ResultWriter Example”

Community-Managed OpenAPI Spec for Pinecone API

The excellent vector database Pinecone has a very useful API, but client support is sparse. While the API and its clients are in theory based off of an OpenAPI specno one seems to be able to find it.

But I needed an OpenAPI spec for the API, so I reverse engineered one (read: copy and pasted from the documentation).

It is very new and so should be considered very experimental. If you use it and find a bug, please open an issue, or even better submit a pull request!

Community-Managed AWS Lambda Base Images for Java 20

I’ve added a new custom base image for Java 20 on Lambda to complement the community base images already available for Java 17, Java 18, and Java 19. You can find the images on the ECR Public Gallery and DockerHub and the source code on GitHub. All the new features in Java 20 are in preview or incubator, as befits a non-LTS release, but for those who like to live life on the bleeding edge, there’s lots of new toys to play with. These base images will let you get started.

Continue reading “Community-Managed AWS Lambda Base Images for Java 20”

Adding Code Blocks with Syntax Highlighting to WebFlow

EDIT: WebFlow now supports Code Blocks out of the box! However, it only supports them in Pages, not in CMS Collection Entries, so it’s not a total solution. This article’s approach works in CMS Collection Entries, so if you need code blocks there, read on!

I’m in the process of developing the marketing website for Arachnio. Being an API product, I need to embed code in some of the site’s pages and blog posts. The website is built on WebFlow, which I have found to be generally outstanding, but it does not support inline code or code blocks out of the box. Here’s how I got code blocks with syntax highlighting working in WebFlow.

Continue reading “Adding Code Blocks with Syntax Highlighting to WebFlow”