PostgreSQL splitting time range into days
First off, your upper border concept is broken. A timestamp with 23:59:59
is no good. The data type timestamp
has fractional digits. What about 2013-10-18 23:59:59.123::timestamp
?
Include the lower border and exclude the upper border everywhere in your logic. Compare:
Building on this premise:
Postgres 9.2 or older
SELECT id , stime , etimeFROM timesheet_entries tWHERE etime <= stime::date + 1 -- this includes upper border 00:00UNION ALLSELECT id , CASE WHEN stime::date = d THEN stime ELSE d END -- AS stime , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etimeFROM ( SELECT id , stime , etime , generate_series(stime::date, etime::date, interval '1d')::date AS d FROM timesheet_entries t WHERE etime > stime::date + 1 ) subORDER BY id, stime;
Or simply:
SELECT id , CASE WHEN stime::date = d THEN stime ELSE d END -- AS stime , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etimeFROM ( SELECT id , stime , etime , generate_series(stime::date, etime::date, interval '1d')::date AS d FROM timesheet_entries t ) subORDER BY id, stime;
The simpler one may even be faster.
Note a corner case difference when stime
and etime
both fall on 00:00
exactly. Then a row with a zero time range is added at the end. There are various ways to deal with that. I propose:
SELECT *FROM ( SELECT id , CASE WHEN stime::date = d THEN stime ELSE d END AS stime , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime FROM ( SELECT id , stime , etime , generate_series(stime::date, etime::date, interval '1d')::date AS d FROM timesheet_entries t ) sub1 ORDER BY id, stime ) sub2WHERE etime <> stime;
Postgres 9.3+
In Postgres 9.3+ you would better use LATERAL
for this
SELECT id , CASE WHEN stime::date = d THEN stime ELSE d END AS stime , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etimeFROM timesheet_entries t , LATERAL (SELECT d::date FROM generate_series(t.stime::date, t.etime::date, interval '1d') d ) dORDER BY id, stime;
Details in the manual.
Same corner case as above.
SQL Fiddle demonstrating all.
There is simply solution (if intervals starts in same time)
postgres=# select i, i + interval '1day' - interval '1sec' from generate_series('2013-01-01 00:00:00'::timestamp, '2013-01-02 23:59:59', '1day') g(i); i │ ?column? ─────────────────────┼───────────────────── 2013-01-01 00:00:00 │ 2013-01-01 23:59:59 2013-01-02 00:00:00 │ 2013-01-02 23:59:59(2 rows)
I wrote a table function, that do it for any interval. It is fast - two years range divide to 753 ranges in 10ms
create or replace function day_ranges(timestamp, timestamp)returns table(t1 timestamp, t2 timestamp) as $$begin t1 := $1; if $2 > $1 then loop if t1::date = $2::date then t2 := $2; return next; exit; end if; t2 := date_trunc('day', t1) + interval '1day' - interval '1sec'; return next; t1 := t2 + interval '1sec'; end loop; end if; return;end;$$ language plpgsql;
Result:
postgres=# select * from day_ranges('2013-10-08 22:00:00', '2013-10-10 23:00:00'); t1 │ t2 ─────────────────────┼───────────────────── 2013-10-08 22:00:00 │ 2013-10-09 23:59:59 2013-10-09 00:00:00 │ 2013-10-09 23:59:59 2013-10-10 00:00:00 │ 2013-10-10 23:00:00(3 rows)Time: 6.794 ms
and faster (and little bit longer) version based on RETURN QUERY
create or replace function day_ranges(timestamp, timestamp)returns table(t1 timestamp, t2 timestamp) as $$begin t1 := $1; t2 := $2; if $1::date = $2::date then return next; else -- first day t2 := date_trunc('day', t1) + interval '1day' - interval '1sec'; return next; if $2::date > $1::date + 1 then return query select d, d + interval '1day' - interval '1sec' from generate_series(date_trunc('day', $1 + interval '1day')::timestamp, date_trunc('day', $2 - interval '1day')::timestamp, '1day') g(d); end if; -- last day t1 := date_trunc('day', $2); t2 := $2; return next; end if; return;end;$$ language plpgsql;
You may build the result by simply generating all days in the entire set of ranges and join this to the ranges themselves using the OVERLAPS operator. This will both discard the days for which there is no data and do the required cartesian product for the days with data.
OVERLAPS behavior is explained in the doc as this:
(start1, end1) OVERLAPS (start2, end2)
(start1, length1) OVERLAPS (start2, length2)
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.
With your timesheet_entries
table, the query would be:
select days.day, timesheet_entries.* from (select day from generate_series( (select min(stime) from timesheet_entries), (select max(etime) from timesheet_entries), '1 day'::interval) day ) days join timesheet_entries on (stime,etime) overlaps (days.day,days.day+'1 day'::interval) order by 1;
This technique makes it also easy to add the days without data, calendar-like. To do so, just replace the join
by a left join
.
As pointed by @Erwin in his answer, etime
should not represent the last second (...:59:59
) of the interval but the next second as an excluded upper bound. It may not matter with your current data, but if certain rows had stime
at ...:59:59
or if you happened to need sub-second resolution, that would be a problem.