The excellent PostgreSQL database supports the following syntax for the common upsert SQL pattern:
INSERT INTO table(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id) DO UPDATE
SET
column1=EXCLUDED.value1,
column2=EXCLUDED.value2
RETURNING *
With this syntax, you can insert a new row with the given values, or update the existing row to the given values, using the row’s primary key (or other indexed key). Very handy!
But did the resulting upserted row insert or update?
There’s no (obvious) way to tell if the row was created or updated. It’s tempting to use tricks like check the value of an EXCLUDED
column in the RETURNING
clause, except that it can’t be used in that scope. And this is a very practical question: this matters for a variety of systems, e.g., operations with “put” semantics. Absent this output from the query, the workaround is to use two queries inside of a transaction or checkpoint instead of just one query, which is a bummer.
Fortunately, I finally found a good solution from the wonderful Erwin Brandstetter. (This is a question about Postgres. Of course Erwin was the solver!) Here’s the answer, using the same example from above:
INSERT INTO table(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id) DO UPDATE
SET
column1=EXCLUDED.value1,
column2=EXCLUDED.value2
RETURNING *, (xmax = 0) AS _created
In this solution, a new boolean-valued column, _created
, is added to the result that is true
if the column was created, or false
otherwise (i.e., if it was updated). There is a detailed explanation from Erwin at the above link, but the gist is that xmax
is a row-level system column that contains a row lock ID (in queries like this one), and a value of 0
indicates that this row has no row lock, which can only happen if it has been created. (An updated row has to take a row lock, per the I in ACID.)
A different answer cautions (fairly) that this relies on undocumented details of the xmax
column implementation, and therefore could change SomedayTM, but it’s unlikely because of the way Postgres works. For my part, I’ll use it, but make sure I have some tests in the project to catch any behavior change on database update.
As a bonus (and a note to my future self), here is a good way to use that query for PATCH
semantics, where only non-NULL
values are used to update an existing row:
INSERT INTO table AS EXISTING(id, column1, column2)
VALUES (@id, @value1, @value2)
ON CONFLICT (id)
DO UPDATE SET
column1=COALESCE(EXCLUDED.value1, EXISTING.value1),
column2=COALESCE(EXCLUDED.value2, EXISTING.value2)
RETURNING *, (xmax = 0) AS _created
Here, the EXISTING
table alias is a nod to the special EXCLUDED
alias, which is used to refer to the column that was “excluded” from INSERT
.