Multiple DISTINCT ON clauses in PostgreSQL
As commented, there is ambiguity in the question. The number of result rows can differ for every call. If you are satisfied with arbitrary results, @klin's solution is good enough. Else, you need to define requiremens more closely. Like:
- distinct on
(name, birth)
, pick smallest height first, then smallest ID as tiebreaker - distinct on
(name, height)
, pick earliest birth first, then smallest ID as tiebreaker
Your table should have a primary key (or some way to identify rows uniquely):
CREATE TEMP TABLE tbl ( tbl_id serial PRIMARY KEY, name text, birth int, height numeric);INSERT INTO tbl (name, birth, height)VALUES ('William', 1976, 1.82), ('James', 1981, 1.68), ('Mike', 1976, 1.68), ('Tom', 1967, 1.79), ('William', 1976, 1.74), ('William', 1981, 1.82), ('Tom', 1978, 1.92), ('Mike', 1963, 1.68), ('Tom', 1971, 1.86), ('James', 1981, 1.77), ('Tom', 1971, 1.89);
Query:
SELECT DISTINCT ON (name, height) *FROM ( SELECT DISTINCT ON (name, birth) * FROM tbl ORDER BY name, birth, height, tbl_id -- pick smallest height, ID as tiebreaker ) subORDER BY name, height, birth, tbl_id; -- pick earliest birth, ID as tiebreaker
tbl_id | name | birth | height--------+---------+-------+-------- 2 | James | 1981 | 1.68 8 | Mike | 1963 | 1.68 4 | Tom | 1967 | 1.79 9 | Tom | 1971 | 1.86 7 | Tom | 1978 | 1.92 5 | William | 1976 | 1.74 6 | William | 1981 | 1.82(7 rows) -- !!!
A DISTINCT ON
query without deterministic ORDER BY
can return any arbitrary row from each set of dupes. Applied once, you still get a deterministic number of rows (with arbitrary picks). Applied repeatedly, the resulting number of rows is arbitrary, too. Related:
Use a derived table:
with my_table(name, birth, height) as (values('William', 1976, 1.82),('James', 1981, 1.68),('Mike', 1976, 1.68),('Tom', 1967, 1.79),('William', 1976, 1.74), -- ? (name, birth)('William', 1981, 1.82), -- ? (name, height)('Tom', 1978, 1.92),('Mike', 1963, 1.68), -- ? (name, height)('Tom', 1971, 1.86),('James', 1981, 1.77), -- ? (name, birth)('Tom', 1971, 1.89) -- ? (name, birth))select distinct on (name, height) *from ( select distinct on (name, birth) * from my_table ) s name | birth | height ---------+-------+-------- James | 1981 | 1.68 Mike | 1963 | 1.68 Tom | 1967 | 1.79 Tom | 1971 | 1.89 Tom | 1978 | 1.92 William | 1976 | 1.82(6 rows)