PostgreSQL DISTINCT ON with different ORDER BY PostgreSQL DISTINCT ON with different ORDER BY postgresql postgresql

PostgreSQL DISTINCT ON with different ORDER BY


Documentation says:

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

Official documentation

So you'll have to add the address_id to the order by.

Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id and that result sorted by purchased_at then you're trying to solve a greatest N per group problem which can be solved by the following approaches:

The general solution that should work in most DBMSs:

SELECT t1.* FROM purchases t1JOIN (    SELECT address_id, max(purchased_at) max_purchased_at    FROM purchases    WHERE product_id = 1    GROUP BY address_id) t2ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_atORDER BY t1.purchased_at DESC

A more PostgreSQL-oriented solution based on @hkf's answer:

SELECT * FROM (  SELECT DISTINCT ON (address_id) *  FROM purchases   WHERE product_id = 1  ORDER BY address_id, purchased_at DESC) tORDER BY purchased_at DESC

Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another


You can order by address_id in an subquery, then order by what you want in an outer query.

SELECT * FROM     (SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*     FROM "purchases"     WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC ) ORDER BY purchased_at DESC


A subquery can solve it:

SELECT *FROM  (    SELECT DISTINCT ON (address_id) *    FROM   purchases    WHERE  product_id = 1    ) pORDER  BY purchased_at DESC;

Leading expressions in ORDER BY have to agree with columns in DISTINCT ON, so you can't order by different columns in the same SELECT.

Only use an additional ORDER BY in the subquery if you want to pick a particular row from each set:

SELECT *FROM  (    SELECT DISTINCT ON (address_id) *    FROM   purchases    WHERE  product_id = 1    ORDER  BY address_id, purchased_at DESC  -- get "latest" row per address_id    ) pORDER  BY purchased_at DESC;

If purchased_at can be NULL, use DESC NULLS LAST - and match your index for best performance. See:

Related, with more explanation: