The SQL OVER() clause - when and why is it useful?
You can use GROUP BY SalesOrderID
. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY.
In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual OrderQty
values and their sums, counts, averages etc. over groups of same SalesOrderID
s.
Here's a practical example of why windowed aggregates are great. Suppose you need to calculate what percent of a total every value is. Without windowed aggregates you'd have to first derive a list of aggregated values and then join it back to the original rowset, i.e. like this:
SELECT orig.[Partition], orig.Value, orig.Value * 100.0 / agg.TotalValue AS ValuePercentFROM OriginalRowset orig INNER JOIN ( SELECT [Partition], SUM(Value) AS TotalValue FROM OriginalRowset GROUP BY [Partition] ) agg ON orig.[Partition] = agg.[Partition]
Now look how you can do the same with a windowed aggregate:
SELECT [Partition], Value, Value * 100.0 / SUM(Value) OVER (PARTITION BY [Partition]) AS ValuePercentFROM OriginalRowset orig
Much easier and cleaner, isn't it?
The OVER
clause is powerful in that you can have aggregates over different ranges ("windowing"), whether you use a GROUP BY
or not
Example: get count per SalesOrderID
and count of all
SELECT SalesOrderID, ProductID, OrderQty ,COUNT(OrderQty) AS 'Count' ,COUNT(*) OVER () AS 'CountAll'FROM Sales.SalesOrderDetail WHERE SalesOrderID IN(43659,43664)GROUP BY SalesOrderID, ProductID, OrderQty
Get different COUNT
s, no GROUP BY
SELECT SalesOrderID, ProductID, OrderQty ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'CountQtyPerOrder' ,COUNT(OrderQty) OVER(PARTITION BY ProductID) AS 'CountQtyPerProduct', ,COUNT(*) OVER () AS 'CountAllAgain'FROM Sales.SalesOrderDetail WHERE SalesOrderID IN(43659,43664)
If you only wanted to GROUP BY the SalesOrderID then you wouldn't be able to include the ProductID and OrderQty columns in the SELECT clause.
The PARTITION BY clause let's you break up your aggregate functions. One obvious and useful example would be if you wanted to generate line numbers for order lines on an order:
SELECT O.order_id, O.order_date, ROW_NUMBER() OVER(PARTITION BY O.order_id) AS line_item_no, OL.product_idFROM Orders OINNER JOIN Order_Lines OL ON OL.order_id = O.order_id
(My syntax might be off slightly)
You would then get back something like:
order_id order_date line_item_no product_id-------- ---------- ------------ ---------- 1 2011-05-02 1 5 1 2011-05-02 2 4 1 2011-05-02 3 7 2 2011-05-12 1 8 2 2011-05-12 2 1