Finding gaps in huge event streams?
In postgres it can be done very easily with a help of the lag() window function. Check the fiddle below as an example:
PostgreSQL 9.3 Schema Setup:
CREATE TABLE Table1 ("id" int, "stream_id" int, "timestamp" timestamp);INSERT INTO Table1 ("id", "stream_id", "timestamp")VALUES (1, 7, '2015-06-01 15:20:30'), (2, 7, '2015-06-01 15:20:31'), (3, 7, '2015-06-01 15:20:32'), (4, 7, '2015-06-01 15:25:30'), (5, 7, '2015-06-01 15:25:31');
Query 1:
with c as (select *, lag("timestamp") over(partition by stream_id order by id) as pre_time, lag(id) over(partition by stream_id order by id) as pre_id from Table1 )select * from c where "timestamp" - pre_time > interval '2 sec'
| id | stream_id | timestamp | pre_time | pre_id ||----|-----------|------------------------|------------------------|--------|| 4 | 7 | June, 01 2015 15:25:30 | June, 01 2015 15:20:32 | 3 |
You can do this with the lag()
window function over a partition by the stream_id which is ordered by the timestamp. The lag()
function gives you access to previous rows in the partition; without a lag value, it is the previous row. So if the partition on stream_id is ordered by time, then the previous row is the previous event for that stream_id.
SELECT stream_id, lag(id) OVER pair AS start_id, id AS end_id, ("timestamp" - lag("timestamp") OVER pair) AS diffFROM my_tableWHERE diff > interval '2 minutes'WINDOW pair AS (PARTITION BY stream_id ORDER BY "timestamp");