Postgres Upsert: Created or Updated? - data, code and conversation

The excellent PostgreSQL database supports the following syntax for the common upsert SQL pattern:

INSERT INTO table(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id) DO UPDATE
SET
    column1=EXCLUDED.value1,
    column2=EXCLUDED.value2
RETURNING *

With this syntax, you can insert a new row with the given values, or update the existing row to the given values, using the row’s primary key (or other indexed key). Very handy!

But did the resulting upserted row insert or update?

There’s no (obvious) way to tell if the row was created or updated. It’s tempting to use tricks like check the value of an EXCLUDED column in the RETURNING clause, except that it can’t be used in that scope. And this is a very practical question: this matters for a variety of systems, e.g., operations with “put” semantics. Absent this output from the query, the workaround is to use two queries inside of a transaction or checkpoint instead of just one query, which is a bummer.

Fortunately, I finally found a good solution from the wonderful Erwin Brandstetter. (This is a question about Postgres. Of course Erwin was the solver!) Here’s the answer, using the same example from above:

INSERT INTO table(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id) DO UPDATE
SET
    column1=EXCLUDED.value1,
    column2=EXCLUDED.value2
RETURNING *, (xmax = 0) AS _created

In this solution, a new boolean-valued column, _created, is added to the result that is true if the column was created, or false otherwise (i.e., if it was updated). There is a detailed explanation from Erwin at the above link, but the gist is that xmax is a row-level system column that contains a row lock ID (in queries like this one), and a value of 0 indicates that this row has no row lock, which can only happen if it has been created. (An updated row has to take a row lock, per the I in ACID.)

A different answer cautions (fairly) that this relies on undocumented details of the xmax column implementation, and therefore could change Someday^TM, but it’s unlikely because of the way Postgres works. For my part, I’ll use it, but make sure I have some tests in the project to catch any behavior change on database update.

As a bonus (and a note to my future self), here is a good way to use that query for PATCH semantics, where only non-NULL values are used to update an existing row:

INSERT INTO table AS EXISTING(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id)
DO UPDATE SET
     column1=COALESCE(EXCLUDED.value1, EXISTING.value1),
     column2=COALESCE(EXCLUDED.value2, EXISTING.value2)
RETURNING *, (xmax = 0) AS _created

Here, the EXISTING table alias is a nod to the special EXCLUDED alias, which is used to refer to the column that was “excluded” from INSERT.