SQL join: selecting the last records in a one-to-many relationship SQL join: selecting the last records in a one-to-many relationship sql sql

SQL join: selecting the last records in a one-to-many relationship


This is an example of the greatest-n-per-group problem that has appeared regularly on StackOverflow.

Here's how I usually recommend solving it:

SELECT c.*, p1.*FROM customer cJOIN purchase p1 ON (c.id = p1.customer_id)LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND     (p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))WHERE p2.id IS NULL;

Explanation: given a row p1, there should be no row p2 with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1 is the most recent purchase for that customer.

Regarding indexes, I'd create a compound index in purchase over the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN on MySQL.


Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.


You could also try doing this using a sub select

SELECT  c.*, p.*FROM    customer c INNER JOIN        (            SELECT  customer_id,                    MAX(date) MaxDate            FROM    purchase            GROUP BY customer_id        ) MaxDates ON c.id = MaxDates.customer_id INNER JOIN        purchase p ON   MaxDates.customer_id = p.customer_id                    AND MaxDates.MaxDate = p.date

The select should join on all customers and their Last purchase date.


Another approach would be to use a NOT EXISTS condition in your join condition to test for later purchases:

SELECT *FROM customer cLEFT JOIN purchase p ON (       c.id = p.customer_id   AND NOT EXISTS (     SELECT 1 FROM purchase p1     WHERE p1.customer_id = c.id     AND p1.id > p.id   ))