SQL Server: Difference between PARTITION BY and GROUP BY SQL Server: Difference between PARTITION BY and GROUP BY sql-server sql-server

SQL Server: Difference between PARTITION BY and GROUP BY


They're used in different places. group by modifies the entire query, like:

select customerId, count(*) as orderCountfrom Ordersgroup by customerId

But partition by just works on a window function, like row_number:

select row_number() over (partition by customerId order by orderId)    as OrderNumberForThisCustomerfrom Orders

A group by normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row. partition by does not affect the number of rows returned, but it changes how a window function's result is calculated.


We can take a simple example.

Consider a table named TableA with the following values:

id  firstname                   lastname                    Mark-------------------------------------------------------------------1   arun                        prasanth                    402   ann                         antony                      453   sruthy                      abc                         416   new                         abc                         471   arun                        prasanth                    451   arun                        prasanth                    492   ann                         antony                      49

GROUP BY

The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.

In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

Syntax:

SELECT expression1, expression2, ... expression_n,        aggregate_function (aggregate_expression)FROM tablesWHERE conditionsGROUP BY expression1, expression2, ... expression_n;

We can apply GROUP BY in our table:

select SUM(Mark)marksum,firstname from TableAgroup by id,firstName

Results:

marksum  firstname----------------94      ann                      134     arun                     47      new                      41      sruthy   

In our real table we have 7 rows and when we apply GROUP BY id, the server group the results based on id:

In simple words:

here GROUP BY normally reduces the number of rows returned by rolling them up and calculating Sum() for each row.

PARTITION BY

Before going to PARTITION BY, let us look at the OVER clause:

According to the MSDN definition:

OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

PARTITION BY will not reduce the number of rows returned.

We can apply PARTITION BY in our example table:

SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA

Result:

marksum firstname -------------------134     arun                     134     arun                     134     arun                     94      ann                      94      ann                      41      sruthy                   47      new  

Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.


partition by doesn't actually roll up the data. It allows you to reset something on a per group basis. For example, you can get an ordinal column within a group by partitioning on the grouping field and using rownum() over the rows within that group. This gives you something that behaves a bit like an identity column that resets at the beginning of each group.