Performance of max() vs ORDER BY DESC + LIMIT 1 Performance of max() vs ORDER BY DESC + LIMIT 1 postgresql postgresql

Performance of max() vs ORDER BY DESC + LIMIT 1


There does not seem to be an index on sensor.station_id, which is most probably important here.

There is an actual difference between max() and ORDER BY DESC + LIMIT 1. Many people seem to miss that. NULL values sort first in descending sort order. So ORDER BY timestamp DESC LIMIT 1 returns a row with timestamp IS NULL if it exists, while the aggregate function max() ignores NULL values and returns the latest non-null timestamp.

For your case, since your column d.timestamp is defined NOT NULL (as your update revealed), there is no effective difference. An index with DESC NULLS LAST and the same clause in the ORDER BY for the LIMIT query should still serve you best. I suggest these indexes (my query below builds on the 2nd one):

sensor(station_id, id)data(sensor_id, timestamp DESC NULLS LAST)

You can drop the other index variants sensor_ind_timestamp and sensor_ind_timestamp_desc unless you have other queries that still require them (unlikely, but possible).

Much more importantly, there is another difficulty: The filter on the first table sensors returns few, but still (possibly) multiple rows. Postgres expects to find 2 rows (rows=2) in your added EXPLAIN output.
The perfect technique would be a loose index scan for the second table data - which is not currently implemented in Postgres 9.4 (or Postgres 9.5). You can rewrite the query to work around this limitation in various ways. Details:

The best should be:

SELECT d.timestampFROM   sensors sCROSS  JOIN LATERAL  (   SELECT timestamp   FROM   data   WHERE  sensor_id = s.id   ORDER  BY timestamp DESC NULLS LAST   LIMIT  1   ) dWHERE  s.station_id = 4ORDER  BY d.timestamp DESC NULLS LASTLIMIT  1;

Since the style of outer query is mostly irrelevant, you can also just:

SELECT max(d.timestamp) AS timestampFROM   sensors sCROSS  JOIN LATERAL  (   SELECT timestamp   FROM   data   WHERE  sensor_id = s.id   ORDER  BY timestamp DESC NULLS LAST   LIMIT  1   ) dWHERE  s.station_id = 4;

And the max() variant should perform about as fast now:

SELECT max(d.timestamp) AS timestampFROM   sensors sCROSS  JOIN LATERAL  (   SELECT max(timestamp) AS timestamp   FROM   data   WHERE  sensor_id = s.id   ) dWHERE  s.station_id = 4;

Or even, shortest of all:

SELECT max((SELECT max(timestamp) FROM data WHERE sensor_id = s.id)) AS timestampFROM   sensors sWHERE  station_id = 4;

Note the double parentheses!

The additional advantage of LIMIT in a LATERAL join is that you can retrieve arbitrary columns of the selected row, not just the latest timestamp (one column).

Related:


The query plan shows index names timestamp_ind and timestamp_sensor_ind. But indexes like that do not help with a search for a particular sensor.

To resolve an equals query (like sensor.id = data.sensor_id) the column has to be the first in the index. Try to add an index that allows searching on sensor_id and, within a sensor, is sorted by timestamp:

create index sensor_timestamp_ind on data(sensor_id, timestamp);

Does adding that index speed up the query?