SQL join: selecting the last records in a one-to-many relationship
This is an example of the greatest-n-per-group
problem that has appeared regularly on StackOverflow.
Here's how I usually recommend solving it:
SELECT c.*, p1.*FROM customer cJOIN purchase p1 ON (c.id = p1.customer_id)LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND (p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))WHERE p2.id IS NULL;
Explanation: given a row p1
, there should be no row p2
with the same customer and a later date (or in the case of ties, a later id
). When we find that to be true, then p1
is the most recent purchase for that customer.
Regarding indexes, I'd create a compound index in purchase
over the columns (customer_id
, date
, id
). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN
on MySQL.
Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.
You could also try doing this using a sub select
SELECT c.*, p.*FROM customer c INNER JOIN ( SELECT customer_id, MAX(date) MaxDate FROM purchase GROUP BY customer_id ) MaxDates ON c.id = MaxDates.customer_id INNER JOIN purchase p ON MaxDates.customer_id = p.customer_id AND MaxDates.MaxDate = p.date
The select should join on all customers and their Last purchase date.
Another approach would be to use a NOT EXISTS
condition in your join condition to test for later purchases:
SELECT *FROM customer cLEFT JOIN purchase p ON ( c.id = p.customer_id AND NOT EXISTS ( SELECT 1 FROM purchase p1 WHERE p1.customer_id = c.id AND p1.id > p.id ))