PostgreSQL , Select from 2 tables, but only the latest element from table 2
Use PostgreSQL
extension DISTINCT ON
:
SELECT DISTINCT ON (documents.id) *FROM documentJOIN updatesON updates.document_id = document_idORDER BY documents.id, updates.date DESC
This will take the first row from each document.id
cluster in ORDER BY
order.
Test script to check:
SELECT DISTINCT ON (documents.id) *FROM ( VALUES (1, 'Test Title'), (2, 'Test Title 2') ) documents (id, title)JOIN ( VALUES (1, 1, '2006-01-01'::DATE), (2, 1, '2007-01-01'::DATE), (3, 1, '2008-01-01'::DATE), (4, 2, '2009-01-01'::DATE), (5, 2, '2010-01-01'::DATE) ) updates (id, document_id, date)ON updates.document_id = documents.idORDER BY documents.id, updates.date DESC
You may create a derived table which contains only the most recent "updates" records per document_id, and then join "documents" against that:
SELECT d.id, d.title, u.update_id, u."date"FROM documents dLEFT JOIN-- JOIN "documents" against the most recent update per document_id(SELECT recent.document_id, id AS update_id, recent."date"FROM updatesINNER JOIN(SELECT document_id, MAX("date") AS "date" FROM updates GROUP BY 1) recentON updates.document_id = recent.document_idWHERE updates."date" = recent."date") uON d.id = u.document_id;
This will handle "un-updated" documents, like so:
pg=> select * from documents; id | title ----+------- 1 | foo 2 | bar 3 | baz(3 rows)pg=> select * from updates; id | document_id | date ----+-------------+------------ 1 | 1 | 2009-10-30 2 | 1 | 2009-11-04 3 | 1 | 2009-11-07 4 | 2 | 2009-11-09(4 rows)pg=> SELECT d.id ... id | title | update_id | date ----+-------+-----------+------------ 1 | foo | 3 | 2009-11-07 2 | bar | 4 | 2009-11-09 3 | baz | | (3 rows)
select *from documentsleft join updates on updates.document_id=documents.id and updates.date=(select max(date) from updates where document_id=documents.id)where documents.id=?;
It has the some advantages over previous answers:
- you can write document_id only in one place which is convenient;
- you can omit where and you'll get a table of all documents and their latest updates;
- you can use more broad selection criteria, for example
where documents.id in (1,2,3)
.
You can also avoid a subselect using group by, but you'll have to list all fields of documents in group by clause:
select documents.*, max(date) as max_date from documents left join updates on documents.id=document_id where documents.id=1 group by documents.id, title;