Best way to count records by arbitrary time intervals in Rails+Postgres

sql ruby-on-rails postgresql aggregate-functions generate-series

Luckily, you are using PostgreSQL. The window function generate_series() is your friend.

Test case

Given the following test table (which you should have provided):

CREATE TABLE event(event_id serial, ts timestamp);INSERT INTO event (ts)SELECT generate_series(timestamp '2018-05-01'                     , timestamp '2018-05-08'                     , interval '7 min') + random() * interval '7 min';

One event for every 7 minutes (plus 0 to 7 minutes, randomly).

Basic solution

This query counts events for any arbitrary time interval. 17 minutes in the example:

WITH grid AS (   SELECT start_time        , lead(start_time, 1, 'infinity') OVER (ORDER BY start_time) AS end_time   FROM  (      SELECT generate_series(min(ts), max(ts), interval '17 min') AS start_time      FROM   event      ) sub   )SELECT start_time, count(e.ts) AS eventsFROM   grid       gLEFT   JOIN event e ON e.ts >= g.start_time                   AND e.ts <  g.end_timeGROUP  BY start_timeORDER  BY start_time;

The query retrieves minimum and maximum ts from the base table to cover the complete time range. You can use an arbitrary time range instead.
Provide any time interval as needed.
Produces one row for every time slot. If no event happened during that interval, the count is 0.
Be sure to handle upper and lower bound correctly:
- Unexpected results from SQL query with BETWEEN timestamps
The window function lead() has an often overlooked feature: it can provide a default for when no leading row exists. Providing 'infinity' in the example. Else the last interval would be cut off with an upper bound NULL.

Minimal equivalent

The above query uses a CTE and lead() and verbose syntax. Elegant and maybe easier to understand, but a bit more expensive. Here is a shorter, faster, minimal version:

SELECT start_time, count(e.ts) AS eventsFROM  (SELECT generate_series(min(ts), max(ts), interval '17 min') FROM event) g(start_time)LEFT   JOIN event e ON e.ts >= g.start_time                   AND e.ts <  g.start_time + interval '17 min'GROUP  BY 1ORDER  BY 1;

Example for "every 15 minutes in the past week"`

And formatting with to_char().

SELECT to_char(start_time, 'YYYY-MM-DD HH24:MI'), count(e.ts) AS eventsFROM   generate_series(date_trunc('day', localtimestamp - interval '7 days')                     , localtimestamp                     , interval '15 min') g(start_time)LEFT   JOIN event e ON e.ts >= g.start_time                   AND e.ts <  g.start_time + interval '15 min'GROUP  BY start_timeORDER  BY start_time;

Still ORDER BY and GROUP BY on the underlying timestamp value, not on the formatted string. That's faster and more reliable.

db<>fiddle here

Related answer producing a running count over the time frame:

PostgreSQL: running count of rows for a query 'by minute'

CodeHunter

Best way to count records by arbitrary time intervals in Rails+Postgres

Test case

Basic solution

Minimal equivalent

Example for "every 15 minutes in the past week"`

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last