Best way to count records by arbitrary time intervals in Rails+Postgres
Luckily, you are using PostgreSQL. The window function generate_series()
is your friend.
Test case
Given the following test table (which you should have provided):
CREATE TABLE event(event_id serial, ts timestamp);INSERT INTO event (ts)SELECT generate_series(timestamp '2018-05-01' , timestamp '2018-05-08' , interval '7 min') + random() * interval '7 min';
One event for every 7 minutes (plus 0 to 7 minutes, randomly).
Basic solution
This query counts events for any arbitrary time interval. 17 minutes in the example:
WITH grid AS ( SELECT start_time , lead(start_time, 1, 'infinity') OVER (ORDER BY start_time) AS end_time FROM ( SELECT generate_series(min(ts), max(ts), interval '17 min') AS start_time FROM event ) sub )SELECT start_time, count(e.ts) AS eventsFROM grid gLEFT JOIN event e ON e.ts >= g.start_time AND e.ts < g.end_timeGROUP BY start_timeORDER BY start_time;
The query retrieves minimum and maximum
ts
from the base table to cover the complete time range. You can use an arbitrary time range instead.Provide any time interval as needed.
Produces one row for every time slot. If no event happened during that interval, the count is
0
.Be sure to handle upper and lower bound correctly:
The window function
lead()
has an often overlooked feature: it can provide a default for when no leading row exists. Providing'infinity'
in the example. Else the last interval would be cut off with an upper boundNULL
.
Minimal equivalent
The above query uses a CTE and lead()
and verbose syntax. Elegant and maybe easier to understand, but a bit more expensive. Here is a shorter, faster, minimal version:
SELECT start_time, count(e.ts) AS eventsFROM (SELECT generate_series(min(ts), max(ts), interval '17 min') FROM event) g(start_time)LEFT JOIN event e ON e.ts >= g.start_time AND e.ts < g.start_time + interval '17 min'GROUP BY 1ORDER BY 1;
Example for "every 15 minutes in the past week"`
And formatting with to_char()
.
SELECT to_char(start_time, 'YYYY-MM-DD HH24:MI'), count(e.ts) AS eventsFROM generate_series(date_trunc('day', localtimestamp - interval '7 days') , localtimestamp , interval '15 min') g(start_time)LEFT JOIN event e ON e.ts >= g.start_time AND e.ts < g.start_time + interval '15 min'GROUP BY start_timeORDER BY start_time;
Still ORDER BY
and GROUP BY
on the underlying timestamp value, not on the formatted string. That's faster and more reliable.
db<>fiddle here
Related answer producing a running count over the time frame: