PostgreSQL - how should I use first_value()?
All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...
):
SELECT cstamp, price, date_trunc('hour',cstamp) AS h, floor(EXTRACT(minute FROM cstamp) / 5) AS m5, min(price) OVER w, max(price) OVER w, first_value(price) OVER w, last_value(price) OVER wFROM tradesWhere date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'WINDOW w AS ( PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5) ORDER BY cstamp range between unbounded preceding and unbounded following )ORDER BY cstamp;
Here's a quick query to illustrate the behaviour:
select v, first_value(v) over w1 f1, first_value(v) over w2 f2, first_value(v) over w3 f3, last_value (v) over w1 l1, last_value (v) over w2 l2, last_value (v) over w3 l3, max (v) over w1 m1, max (v) over w2 m2, max (v) over w3 m3, max (v) over () m4from (values(1),(2),(3),(4)) t(v)window w1 as (order by v), w2 as (order by v rows between unbounded preceding and current row), w3 as (order by v rows between unbounded preceding and unbounded following)
The output of the above query can be seen here (SQLFiddle here):
| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 ||---|----|----|----|----|----|----|----|----|----|----|| 1 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 1 | 4 | 4 || 2 | 1 | 1 | 1 | 2 | 2 | 4 | 2 | 2 | 4 | 4 || 3 | 1 | 1 | 1 | 3 | 3 | 4 | 3 | 3 | 4 | 4 || 4 | 1 | 1 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Few people think of the implicit frames that are applied to window functions that take an ORDER BY
clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. Think about it this way:
- On the row with
v = 1
the ordered window's frame spansv IN (1)
- On the row with
v = 2
the ordered window's frame spansv IN (1, 2)
- On the row with
v = 3
the ordered window's frame spansv IN (1, 2, 3)
- On the row with
v = 4
the ordered window's frame spansv IN (1, 2, 3, 4)
If you want to prevent that behaviour, you have two options:
- Use an explicit
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
clause for ordered window functions - Use no
ORDER BY
clause in those window functions that allow for omitting them (asMAX(v) OVER()
)
More details are explained in this article about LEAD()
, LAG()
, FIRST_VALUE()
and LAST_VALUE()
The result of max()
as window function is base on the frame definition.
The default frame definition (with ORDER BY
) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY
). In the absence of ORDER BY
(like in my answer you are referring to), or if ORDER BY
treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max()
produces the same result for every row in the partition, effectively considering all rows of the partition.
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's last peer. WithoutORDER BY
, all rows of the partition are included in the window frame, since all rows become peers of the current row.
Bold emphasis mine.
The simple solution would be to omit the ORDER BY
in the window definition - just like I demonstrated in the example you are referring to.
All the gory details about frame specifications in the chapter Window Function Calls in the manual.