PostgreSQL - how should I use first_value()? PostgreSQL - how should I use first_value()? postgresql postgresql

PostgreSQL - how should I use first_value()?


SQL Fiddle

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

SELECT      cstamp,    price,    date_trunc('hour',cstamp) AS h,    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,    min(price) OVER w,    max(price) OVER w,    first_value(price) OVER w,    last_value(price) OVER wFROM tradesWhere date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'WINDOW w AS (    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)    ORDER BY cstamp    range between unbounded preceding and unbounded following    )ORDER BY cstamp;


Here's a quick query to illustrate the behaviour:

select   v,  first_value(v) over w1 f1,  first_value(v) over w2 f2,  first_value(v) over w3 f3,  last_value (v) over w1 l1,  last_value (v) over w2 l2,  last_value (v) over w3 l3,  max        (v) over w1 m1,  max        (v) over w2 m2,  max        (v) over w3 m3,  max        (v) over () m4from (values(1),(2),(3),(4)) t(v)window  w1 as (order by v),  w2 as (order by v rows between unbounded preceding and current row),  w3 as (order by v rows between unbounded preceding and unbounded following)

The output of the above query can be seen here (SQLFiddle here):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 ||---|----|----|----|----|----|----|----|----|----|----|| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 || 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 || 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 || 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Think about it this way:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)

If you want to prevent that behaviour, you have two options:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()


The result of max() as window function is base on the frame definition.

The default frame definition (with ORDER BY) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY). In the absence of ORDER BY (like in my answer you are referring to), or if ORDER BY treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max() produces the same result for every row in the partition, effectively considering all rows of the partition.

Per documentation:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last peer. Without ORDER BY, all rows of the partition are included in the window frame, since all rows become peers of the current row.

Bold emphasis mine.

The simple solution would be to omit the ORDER BY in the window definition - just like I demonstrated in the example you are referring to.

All the gory details about frame specifications in the chapter Window Function Calls in the manual.