PostgreSQL splitting time range into days PostgreSQL splitting time range into days postgresql postgresql

PostgreSQL splitting time range into days


First off, your upper border concept is broken. A timestamp with 23:59:59 is no good. The data type timestamp has fractional digits. What about 2013-10-18 23:59:59.123::timestamp?

Include the lower border and exclude the upper border everywhere in your logic. Compare:

Building on this premise:

Postgres 9.2 or older

SELECT id     , stime     , etimeFROM   timesheet_entries tWHERE  etime <= stime::date + 1  -- this includes upper border 00:00UNION ALLSELECT id     , CASE WHEN stime::date = d THEN stime ELSE d END     -- AS stime     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etimeFROM (   SELECT id        , stime        , etime        , generate_series(stime::date, etime::date, interval '1d')::date AS d   FROM   timesheet_entries t   WHERE  etime > stime::date + 1   ) subORDER  BY id, stime;

Or simply:

SELECT id     , CASE WHEN stime::date = d THEN stime ELSE d END     -- AS stime     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etimeFROM (   SELECT id        , stime        , etime        , generate_series(stime::date, etime::date, interval '1d')::date AS d   FROM   timesheet_entries t   ) subORDER  BY id, stime;

The simpler one may even be faster.
Note a corner case difference when stime and etime both fall on 00:00 exactly. Then a row with a zero time range is added at the end. There are various ways to deal with that. I propose:

SELECT *FROM  (   SELECT id        , CASE WHEN stime::date = d THEN stime ELSE d END     AS stime        , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime   FROM (      SELECT id           , stime           , etime           , generate_series(stime::date, etime::date, interval '1d')::date AS d      FROM   timesheet_entries t      ) sub1   ORDER  BY id, stime   ) sub2WHERE  etime <> stime;

Postgres 9.3+

In Postgres 9.3+ you would better use LATERAL for this

SELECT id     , CASE WHEN stime::date = d THEN stime ELSE d END     AS stime     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etimeFROM   timesheet_entries t     , LATERAL (SELECT d::date                FROM   generate_series(t.stime::date, t.etime::date, interval '1d') d                ) dORDER  BY id, stime;

Details in the manual.
Same corner case as above.

SQL Fiddle demonstrating all.


There is simply solution (if intervals starts in same time)

postgres=# select i, i + interval '1day' - interval '1sec'   from generate_series('2013-01-01 00:00:00'::timestamp, '2013-01-02 23:59:59', '1day') g(i);          i          │      ?column?       ─────────────────────┼───────────────────── 2013-01-01 00:00:00 │ 2013-01-01 23:59:59 2013-01-02 00:00:00 │ 2013-01-02 23:59:59(2 rows)

I wrote a table function, that do it for any interval. It is fast - two years range divide to 753 ranges in 10ms

create or replace function day_ranges(timestamp, timestamp)returns table(t1 timestamp, t2 timestamp) as $$begin  t1 := $1;  if $2 > $1 then    loop      if t1::date = $2::date then        t2 := $2;        return next;        exit;      end if;      t2 := date_trunc('day', t1) + interval '1day' - interval '1sec';      return next;      t1 := t2 + interval '1sec';    end loop;  end if;  return;end;$$ language plpgsql;

Result:

postgres=# select * from day_ranges('2013-10-08 22:00:00', '2013-10-10 23:00:00');         t1          │         t2          ─────────────────────┼───────────────────── 2013-10-08 22:00:00 │ 2013-10-09 23:59:59 2013-10-09 00:00:00 │ 2013-10-09 23:59:59 2013-10-10 00:00:00 │ 2013-10-10 23:00:00(3 rows)Time: 6.794 ms

and faster (and little bit longer) version based on RETURN QUERY

create or replace function day_ranges(timestamp, timestamp)returns table(t1 timestamp, t2 timestamp) as $$begin  t1 := $1; t2 := $2;  if $1::date = $2::date then    return next;  else    -- first day    t2 := date_trunc('day', t1) + interval '1day' - interval '1sec';    return next;    if $2::date > $1::date + 1 then      return query select d, d + interval '1day' - interval '1sec'                      from generate_series(date_trunc('day', $1 + interval '1day')::timestamp,                                           date_trunc('day', $2 - interval '1day')::timestamp,                                           '1day') g(d);    end if;    -- last day     t1 := date_trunc('day', $2); t2 := $2;    return next;  end if;  return;end;$$ language plpgsql;


You may build the result by simply generating all days in the entire set of ranges and join this to the ranges themselves using the OVERLAPS operator. This will both discard the days for which there is no data and do the required cartesian product for the days with data.

OVERLAPS behavior is explained in the doc as this:

(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

With your timesheet_entries table, the query would be:

select days.day, timesheet_entries.* from  (select day from generate_series(         (select min(stime) from timesheet_entries),         (select max(etime) from timesheet_entries),         '1 day'::interval) day  ) days join timesheet_entries   on (stime,etime) overlaps (days.day,days.day+'1 day'::interval) order by 1;

This technique makes it also easy to add the days without data, calendar-like. To do so, just replace the join by a left join.

As pointed by @Erwin in his answer, etime should not represent the last second (...:59:59) of the interval but the next second as an excluded upper bound. It may not matter with your current data, but if certain rows had stime at ...:59:59 or if you happened to need sub-second resolution, that would be a problem.