Atomic UPDATE .. SELECT in Postgres

multithreading postgresql concurrency race-condition transaction-isolation

While Erwin's suggestion is possibly the simplest way to get correct behavior (so long as you retry your transaction if you get an exception with SQLSTATE of 40001), queuing applications by their nature tend to work better with requests blocking for a chance to take their turn at the queue than with the PostgreSQL implementation of SERIALIZABLE transactions, which allows higher concurrency and is somewhat more "optimistic" about the chances of collision.

The example query in the question, as it stands, in the default READ COMMITTED transaction isolation level would allow two (or more) concurrent connections to both "claim" the same row from the queue. What will happen is this:

T1 starts and gets as far as locking the row in the UPDATE phase.
T2 overlaps T1 in execution time and attempts to update that row. It blocks pending the COMMIT or ROLLBACK of T1.
T1 commits, having successfully "claimed" the row.
T2 tries to update the row, finds that T1 already has, looks for the new version of the row, finds that it still satisfies the selection criteria (which is just that id matches), and also "claims" the row.

It can be modified to work correctly (if you are using a version of PostgreSQL which allows the FOR UPDATE clause in a subquery). Just add FOR UPDATE to the end of the subquery which selects the id, and this will happen:

T1 starts and now locks the row before selecting the id.
T2 overlaps T1 in execution time and blocks while trying to select an id, pending the COMMIT or ROLLBACK of T1.
T1 commits, having successfully "claimed" the row.
By the time T2 is able to read the row to see the id, it sees that it has been claimed, so it finds the next available id.

At the REPEATABLE READ or SERIALIZABLE transaction isolation level, the write conflict would throw an error, which you could catch and determine was a serialization failure based on the SQLSTATE, and retry.

If you generally want SERIALIZABLE transactions but you want to avoid retries in the queuing area, you might be able to accomplish that by using an advisory lock.

multithreading postgresql concurrency race-condition transaction-isolation

If you are the only user, the query should be fine. In particular, there is no race condition or deadlock within the query itself (between the outer query and the subquery). I quote the manual here:

However, a transaction never conflicts with itself.

For concurrent use, the matter may be more complicated. You would be on the safe side with SERIALIZABLE transaction mode:

BEGIN ISOLATION LEVEL SERIALIZABLE;UPDATE stuffSET    computed = 'working'WHERE  id = (SELECT id FROM stuff WHERE computed IS NULL LIMIT 1)RETURNING * COMMIT;

You need to prepare for serialization failures and retry your query in such a case.

But I am not entirely sure if this isn't overkill. I'll ask @kgrittn to stop by .. he is the expert with concurrency and serializable transactions ..

And he did. :)

Best of both worlds

Run the query in default transaction mode READ COMMITTED.

For Postgres 9.5 or later use FOR UPDATE SKIP LOCKED. See:

Postgres UPDATE … LIMIT 1

For older versions recheck the condition computed IS NULL explicitly in the outer UPDATE:

UPDATE stuffSET    computed = 'working'WHERE  id = (SELECT id FROM stuff WHERE computed IS NULL LIMIT 1)AND   computed IS NULL;

As @kgrittn's advised in the comment to his answer, this query could come up empty, without having done anything, in the (unlikely) case it got intertwined with a concurrent transaction.

Therefore, it would work much like the first variant in transaction mode SERIALIZABLE, you would have to retry - just without the performance penalty.

The only problem: While the conflict is very unlikely because the window of opportunity is just so tiny, it can happen under heavy load. You could not tell for sure whether there are finally no more rows left.

If that does not matter (like in your case), you are done here.
If it does, to be absolutely sure, start one more query with explicit locking after you get an empty result. If this comes up empty, you are done. If not, continue.
In plpgsql it could look like this:

LOOP   UPDATE stuff   SET    computed = 'working'   WHERE  id = (SELECT id FROM stuff WHERE computed IS NULL                LIMIT 1 FOR UPDATE SKIP LOCKED);  -- pg 9.5+   -- WHERE  id = (SELECT id FROM stuff WHERE computed IS NULL LIMIT 1)   -- AND    computed IS NULL; -- pg 9.4-   CONTINUE WHEN FOUND;  -- continue outside loop, may be a nested loop   UPDATE stuff   SET    computed = 'working'   WHERE  id = (SELECT id FROM stuff WHERE computed IS NULL                LIMIT 1 FOR UPDATE);   EXIT WHEN NOT FOUND;  -- exit function (end)END LOOP;

That should give you the best of both worlds: performance and reliability.

CodeHunter

Atomic UPDATE .. SELECT in Postgres

Best of both worlds

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last