PostgreSQL DISTINCT ON with different ORDER BY
Documentation says:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
So you'll have to add the address_id
to the order by.
Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id
and that result sorted by purchased_at
then you're trying to solve a greatest N per group problem which can be solved by the following approaches:
The general solution that should work in most DBMSs:
SELECT t1.* FROM purchases t1JOIN ( SELECT address_id, max(purchased_at) max_purchased_at FROM purchases WHERE product_id = 1 GROUP BY address_id) t2ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_atORDER BY t1.purchased_at DESC
A more PostgreSQL-oriented solution based on @hkf's answer:
SELECT * FROM ( SELECT DISTINCT ON (address_id) * FROM purchases WHERE product_id = 1 ORDER BY address_id, purchased_at DESC) tORDER BY purchased_at DESC
Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another
You can order by address_id in an subquery, then order by what you want in an outer query.
SELECT * FROM (SELECT DISTINCT ON (address_id) purchases.address_id, purchases.* FROM "purchases" WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC ) ORDER BY purchased_at DESC
A subquery can solve it:
SELECT *FROM ( SELECT DISTINCT ON (address_id) * FROM purchases WHERE product_id = 1 ) pORDER BY purchased_at DESC;
Leading expressions in ORDER BY
have to agree with columns in DISTINCT ON
, so you can't order by different columns in the same SELECT
.
Only use an additional ORDER BY
in the subquery if you want to pick a particular row from each set:
SELECT *FROM ( SELECT DISTINCT ON (address_id) * FROM purchases WHERE product_id = 1 ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id ) pORDER BY purchased_at DESC;
If purchased_at
can be NULL
, use DESC NULLS LAST
- and match your index for best performance. See:
- Sort by column ASC, but NULL values first?
- Why does ORDER BY NULLS LAST affect the query plan on a primary key?
Related, with more explanation: