PostgreSQL , Select from 2 tables, but only the latest element from table 2 PostgreSQL , Select from 2 tables, but only the latest element from table 2 postgresql postgresql

PostgreSQL , Select from 2 tables, but only the latest element from table 2


Use PostgreSQL extension DISTINCT ON:

SELECT  DISTINCT ON (documents.id) *FROM    documentJOIN    updatesON      updates.document_id = document_idORDER BY        documents.id, updates.date DESC

This will take the first row from each document.id cluster in ORDER BY order.

Test script to check:

SELECT  DISTINCT ON (documents.id) *FROM    (        VALUES        (1, 'Test Title'),        (2, 'Test Title 2')        ) documents (id, title)JOIN    (        VALUES        (1, 1, '2006-01-01'::DATE),        (2, 1, '2007-01-01'::DATE),        (3, 1, '2008-01-01'::DATE),        (4, 2, '2009-01-01'::DATE),        (5, 2, '2010-01-01'::DATE)        ) updates (id, document_id, date)ON      updates.document_id = documents.idORDER BY        documents.id, updates.date DESC


You may create a derived table which contains only the most recent "updates" records per document_id, and then join "documents" against that:

SELECT d.id, d.title, u.update_id, u."date"FROM documents dLEFT JOIN-- JOIN "documents" against the most recent update per document_id(SELECT recent.document_id, id AS update_id, recent."date"FROM updatesINNER JOIN(SELECT document_id, MAX("date") AS "date" FROM updates GROUP BY 1) recentON updates.document_id = recent.document_idWHERE  updates."date" = recent."date") uON d.id = u.document_id;

This will handle "un-updated" documents, like so:

pg=> select * from documents; id | title ----+-------  1 | foo  2 | bar  3 | baz(3 rows)pg=> select * from updates; id | document_id |    date    ----+-------------+------------  1 |           1 | 2009-10-30  2 |           1 | 2009-11-04  3 |           1 | 2009-11-07  4 |           2 | 2009-11-09(4 rows)pg=> SELECT d.id ... id | title | update_id |    date    ----+-------+-----------+------------  1 | foo   |         3 | 2009-11-07  2 | bar   |         4 | 2009-11-09  3 | baz   |           | (3 rows)


select *from documentsleft join updates  on updates.document_id=documents.id  and updates.date=(select max(date) from updates where document_id=documents.id)where documents.id=?;

It has the some advantages over previous answers:

  • you can write document_id only in one place which is convenient;
  • you can omit where and you'll get a table of all documents and their latest updates;
  • you can use more broad selection criteria, for example where documents.id in (1,2,3).

You can also avoid a subselect using group by, but you'll have to list all fields of documents in group by clause:

select documents.*, max(date) as max_date  from documents  left join updates on documents.id=document_id  where documents.id=1  group by documents.id, title;