Finding gaps in huge event streams? Finding gaps in huge event streams? postgresql postgresql

Finding gaps in huge event streams?


In postgres it can be done very easily with a help of the lag() window function. Check the fiddle below as an example:

SQL Fiddle

PostgreSQL 9.3 Schema Setup:

CREATE TABLE Table1    ("id" int, "stream_id" int, "timestamp" timestamp);INSERT INTO Table1    ("id", "stream_id", "timestamp")VALUES    (1, 7, '2015-06-01 15:20:30'),    (2, 7, '2015-06-01 15:20:31'),    (3, 7, '2015-06-01 15:20:32'),    (4, 7, '2015-06-01 15:25:30'),    (5, 7, '2015-06-01 15:25:31');

Query 1:

with c as (select *,           lag("timestamp") over(partition by stream_id order by id) as pre_time,           lag(id) over(partition by stream_id order by id) as pre_id           from Table1          )select * from c where "timestamp" - pre_time > interval '2 sec'

Results:

| id | stream_id |              timestamp |               pre_time | pre_id ||----|-----------|------------------------|------------------------|--------||  4 |         7 | June, 01 2015 15:25:30 | June, 01 2015 15:20:32 |      3 |


You can do this with the lag() window function over a partition by the stream_id which is ordered by the timestamp. The lag() function gives you access to previous rows in the partition; without a lag value, it is the previous row. So if the partition on stream_id is ordered by time, then the previous row is the previous event for that stream_id.

SELECT stream_id, lag(id) OVER pair AS start_id, id AS end_id,       ("timestamp" - lag("timestamp") OVER pair) AS diffFROM my_tableWHERE diff > interval '2 minutes'WINDOW pair AS (PARTITION BY stream_id ORDER BY "timestamp");