SQL - When would an empty OVER clause be used? SQL - When would an empty OVER clause be used? sql sql

SQL - When would an empty OVER clause be used?


OVER() is part of analytic function and define partitions in your recordset. OVER() is just one partition and applied to the whole dataset

i.e. COUNT(*) OVER() will return in each row how many records in your dataset.

look to this http://msdn.microsoft.com/en-us/library/ms189461.aspx


Say our table is employees:

+-----------+-------+---------+| badge_num | name  | surname |+-----------+-------+---------+|         1 | John  | Smith   ||         2 | Mark  | Pence   ||         3 | Steve | Smith   ||         4 | Bob   | Smith   |+-----------+-------+---------+

Running

SELECT surname, COUNT(*)FROM employeesGROUP BY surname;

we'll get:

+---------+----------+| surname | COUNT(*) |+---------+----------+| Smith   |        3 || Pence   |        1 |+---------+----------+

While running

SELECT surname, COUNT(*) OVER()FROM employeesGROUP BY surname;

we'll get:

+---------+-----------------+| surname | COUNT(*) OVER() |+---------+-----------------+| Smith   |               2 || Pence   |               2 |+---------+-----------------+

In the second case, in each row we are just counting the number of rows of the whole select (not the single partition).


To summarize things, the OVER clause can be used with Ranking Functions(Rank, Row_Number, Dense_Rank..), Aggregate Functions like (AVG, Max, Min, SUM...etc) and Analytics Functions like (First_Value, Last_Value, and few others).

Let's See basic syntax of OVER clause

OVER (          [ <PARTITION BY clause> ]         [ <ORDER BY clause> ]          [ <ROW or RANGE clause> ]        )  

PARTITION BY:It is used to partition data and perform operations on groups with the same data.

ORDER BY:It is used to define the logical order of data in Partitions. When we don't specify Partition, entire resultset is considered as a single partition

:This can be used to specify what rows are supposed to be considered in a partition when performing the operation.

Let's take an example:

Here is my dataset:

Id          Name                                               Gender     Salary----------- -------------------------------------------------- ---------- -----------1           Mark                                               Male       50002           John                                               Male       45003           Pavan                                              Male       50004           Pam                                                Female     55005           Sara                                               Female     40006           Aradhya                                            Female     35007           Tom                                                Male       55008           Mary                                               Female     50009           Ben                                                Male       650010          Jodi                                               Female     700011          Tom                                                Male       550012          Ron                                                Male       5000

So let me execute different scenarios and see how data is impacted and I'll come from difficult syntax to simple one

Select *,SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employeesId          Name                                               Gender     Salary      sum_sal----------- -------------------------------------------------- ---------- ----------- -----------6           Aradhya                                            Female     3500        35005           Sara                                               Female     4000        75002           John                                               Male       4500        120003           Pavan                                              Male       5000        320001           Mark                                               Male       5000        320008           Mary                                               Female     5000        3200012          Ron                                                Male       5000        3200011          Tom                                                Male       5500        485007           Tom                                                Male       5500        485004           Pam                                                Female     5500        485009           Ben                                                Male       6500        5500010          Jodi                                               Female     7000        62000

Just observe the sum_sal part. Here I am using order by Salary and using "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW".In this case, we are not using partition so entire data will be treated as one partition and we are ordering on salary.And the important thing here is UNBOUNDED PRECEDING AND CURRENT ROW. This means when we are calculating the sum, from starting row to the current row for each row.But if we see rows with salary 5000 and name="Pavan", ideally it should be 17000 and for salary=5000 and name=Mark, it should be 22000. But as we are using RANGE and in this case, if it finds any similar elements then it considers them as the same logical group and performs an operation on them and assigns value to each item in that group. That is the reason why we have the same value for salary=5000. The engine went up to salary=5000 and Name=Ron and calculated sum and then assigned it to all salary=5000.

Select *,SUM(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees   Id          Name                                               Gender     Salary      sum_sal----------- -------------------------------------------------- ---------- ----------- -----------6           Aradhya                                            Female     3500        35005           Sara                                               Female     4000        75002           John                                               Male       4500        120003           Pavan                                              Male       5000        170001           Mark                                               Male       5000        220008           Mary                                               Female     5000        2700012          Ron                                                Male       5000        3200011          Tom                                                Male       5500        375007           Tom                                                Male       5500        430004           Pam                                                Female     5500        485009           Ben                                                Male       6500        5500010          Jodi                                               Female     7000        62000

So with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW The difference is for same value items instead of grouping them together, It calculates SUM from starting row to current row and it doesn't treat items with same value differently like RANGE

Select *,SUM(salary) Over(order by salary) as sum_sal from employeesId          Name                                               Gender     Salary      sum_sal----------- -------------------------------------------------- ---------- ----------- -----------6           Aradhya                                            Female     3500        35005           Sara                                               Female     4000        75002           John                                               Male       4500        120003           Pavan                                              Male       5000        320001           Mark                                               Male       5000        320008           Mary                                               Female     5000        3200012          Ron                                                Male       5000        3200011          Tom                                                Male       5500        485007           Tom                                                Male       5500        485004           Pam                                                Female     5500        485009           Ben                                                Male       6500        5500010          Jodi                                               Female     7000        62000

These results are the same as

Select *, SUM(salary) Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_sal from employees

That is because Over(order by salary) is just a short cut of Over(order by salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)So wherever we simply specify Order by without ROWS or RANGE it is taking RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW as default.

Note: This is applicable only to Functions that actually accept RANGE/ROW. For example, ROW_NUMBER and few others don't accept RANGE/ROW and in that case, this doesn't come into the picture.

Till now we saw that Over clause with an order by is taking Range/ROWS and syntax looks something like this RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWAnd it is actually calculating up to the current row from the first row. But what If it wants to calculate values for the entire partition of data and have it for each column (that is from 1st row to last row). Here is the query for that

Select *,sum(salary) Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_sal from employeesId          Name                                               Gender     Salary      sum_sal----------- -------------------------------------------------- ---------- ----------- -----------1           Mark                                               Male       5000        620002           John                                               Male       4500        620003           Pavan                                              Male       5000        620004           Pam                                                Female     5500        620005           Sara                                               Female     4000        620006           Aradhya                                            Female     3500        620007           Tom                                                Male       5500        620008           Mary                                               Female     5000        620009           Ben                                                Male       6500        6200010          Jodi                                               Female     7000        6200011          Tom                                                Male       5500        6200012          Ron                                                Male       5000        62000

Instead of CURRENT ROW, I am specifying UNBOUNDED FOLLOWING which instructs the engine to calculate till the last record of partition for each row.

Now coming to your point on what is OVER() with empty braces?

It is just a short cut for Over(order by salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

Here we are indirectly specifying to treat all my resultset as a single partition and then perform calculations from the first record to the last record of each partition.

Select *,Sum(salary) Over() as sum_sal from employeesId          Name                                               Gender     Salary      sum_sal----------- -------------------------------------------------- ---------- ----------- -----------1           Mark                                               Male       5000        620002           John                                               Male       4500        620003           Pavan                                              Male       5000        620004           Pam                                                Female     5500        620005           Sara                                               Female     4000        620006           Aradhya                                            Female     3500        620007           Tom                                                Male       5500        620008           Mary                                               Female     5000        620009           Ben                                                Male       6500        6200010          Jodi                                               Female     7000        6200011          Tom                                                Male       5500        6200012          Ron                                                Male       5000        62000

I did create a video on this and if you are interested you can visit it.https://www.youtube.com/watch?v=CvVenuVUqto&t=1177s

Thanks,Pavan Kumar AryasomayajuluHTTP://xyzcoder.github.io