Calendar tables in PostgreSQL 9

database-design postgresql data-warehouse rails-postgresql

In PostgreSQL, you can generate calendar tables of arbitrary length and granularity on the fly:

SELECT  CAST('2011-01-01' AS DATE) + (n || ' hour')::INTERVALFROM    generate_series(0, 23) n

This does not require recursion (as with the other systems) and is a preferred method to generate the volatile resultsets.

database-design postgresql data-warehouse rails-postgresql

Calendar tables implement a space/time tradeoff. By using more space, some kinds of queries run in less time, because they can take advantage of indexes. They're safe as long as you're careful with the CHECK() constraints, and as long as you have administrative processes to take care of any constraints that your dbms doesn't support.

If your granularity is one minute, you'll need to generate about a half million rows for each year. A minimal calendar table would look like this.

2011-01-01 00:00:002011-01-01 00:01:002011-01-01 00:02:002011-01-01 00:03:002011-01-01 00:04:00

If you're doing "bucket" analysis, you might be better off with something like this.

bucket_start         bucket_end--2011-01-01 00:00:00  2011-01-01 00:01:002011-01-01 00:01:00  2011-01-01 00:02:002011-01-01 00:02:00  2011-01-01 00:03:002011-01-01 00:03:00  2011-01-01 00:04:002011-01-01 00:04:00  2011-01-01 00:05:00

Since SQL's BETWEEN operator includes the endpoints, you usually need to avoid using it. That's because it includes the endpoints, and it's hard to express bucket_end as "bucket_start plus one minute, minus the smallest bit of time this server can recognize". (The danger is a value that's a microsecond greater than bucket_end, but still less than the next value for bucket_start.)

If I were going to build that table, I'd probably do it like this. (Although I'd think harder about whether I should call it "calendar".)

create table calendar (  bucket_start timestamp primary key,  bucket_end timestamp unique,  CHECK (bucket_end = bucket_start + interval '1' minute)  -- You also want a "no gaps" constraint, but I don't think you   -- can do that in a CHECK constraint in PostgreSQL. You might  -- be able to use a trigger that counts the rows, and compares  -- that count to the number of minutes between min(bucket_start)  -- and max(bucket_start). Worst case, you can always run a report  -- that counts the rows and sends you an email.);

The UNIQUE constraint creates an implicit index in PostgreSQL.

This query will insert one day's worth of rows (24 hours * 60 minutes) at a time.

insert into calendarselect coalesce(                (select max(bucket_start) from calendar),                  cast('2011-01-01 00:00:00' as timestamp)               )              + cast((n || 'minute') as interval) as bucket_start,        coalesce(                (select max(bucket_start) from calendar),                  cast('2011-01-01 00:00:00' as timestamp)               )              + cast((n + 1 || ' minute') as interval) as bucket_endfrom generate_series(1, (24*60) ) n;

You can wrap that in a function to generate a year at a time. I'd probably try to commit fewer than a half million rows at a time.

It shouldn't take too long to generate 20 million rows for testing, and another 20 million rows of "calendar" minutes. Long lunch. Maybe an afternoon in the sun.

database-design postgresql data-warehouse rails-postgresql

In the data warehouses I built I was using separate CALENDAR and TIME_OF_DAY dimensions. The first dimension has a 1 day day granularity and the second one - 1 minute granularity.

In two other cases I knew beforehand that no reporting would be required at the granularity smaller than 15 minutes. In that case for simplicity I was using a single CALENDAR dimension with 96 records per day.

I was using this approach in the Oracle warehouses so far, but I might be involved in a PostgreSQL warehouse project this summer.

CodeHunter

Calendar tables in PostgreSQL 9

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last