What is the fastest way to truncate timestamps to 5 minutes in Postgres? What is the fastest way to truncate timestamps to 5 minutes in Postgres? postgresql postgresql

What is the fastest way to truncate timestamps to 5 minutes in Postgres?


I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.

I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.


Going through epoch/unix time

SELECT to_timestamp(    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))    * EXTRACT(epoch FROM interval '5 min')) FROM huge_table AS ht LIMIT 4000000

(Note this produces timestamptzeven if you used a time zone unaware datatype)

Results

  • Run 1: 39.368 seconds
  • Run 3: 39.526 seconds
  • Run 5: 39.883 seconds

Using date_trunc and date_part

SELECT     date_trunc('hour', ht.time)     + date_part('minute', ht.time)::int / 5 * interval '5 min'FROM huge_table AS ht LIMIT 4000000

Results

  • Run 2: 34.189 seconds
  • Run 4: 37.028 seconds
  • Run 6: 32.397 seconds

System

  • DB version: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
  • Cores: Intel® Xeon®, E5-1650v2, Hexa-Core
  • RAM: 64 GB, DDR3 ECC RAM

Conclusion

Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles 2 hour intervals just as well as 5 minute intervals without having to bump the date_trunc time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.


I don't think there is any quicker method.

And I don't think you should be worried about the performance of the expression.

Everything else that is involved in executing your (SELECT, UPDATE, ...) statement is most probably a lot more expensive (e.g. the I/O to retrieve rows) than that date/time calculation.


Full query for those wondering (based on @DNS question):

Assuming you have orders and you want to count them by slices of 5min and shop_id:

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute      , shop_id, count(id) as orders_countFROM ordersGROUP BY 1, shop_idORDER BY 1 ASC