Is there any difference between GROUP BY and DISTINCT Is there any difference between GROUP BY and DISTINCT sql sql

Is there any difference between GROUP BY and DISTINCT


MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."

However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.

A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?

(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct and screw => get list of unique values in a table column)


GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. On the other hand DISTINCT just removes duplicates.

For example, if you have a bunch of purchase records, and you want to know how much was spent by each department, you might do something like:

SELECT department, SUM(amount) FROM purchases GROUP BY department

This will give you one row per department, containing the department name and the sum of all of the amount values in all rows for that department.


What's the difference from a mere duplicate removal functionality point of view

Apart from the fact that unlike DISTINCT, GROUP BY allows for aggregating data per group (which has been mentioned by many other answers), the most important difference in my opinion is the fact that the two operations "happen" at two very different steps in the logical order of operations that are executed in a SELECT statement.

Here are the most important operations:

  • FROM (including JOIN, APPLY, etc.)
  • WHERE
  • GROUP BY (can remove duplicates)
  • Aggregations
  • HAVING
  • Window functions
  • SELECT
  • DISTINCT (can remove duplicates)
  • UNION, INTERSECT, EXCEPT (can remove duplicates)
  • ORDER BY
  • OFFSET
  • LIMIT

As you can see, the logical order of each operation influences what can be done with it and how it influences subsequent operations. In particular, the fact that the GROUP BY operation "happens before" the SELECT operation (the projection) means that:

  1. It doesn't depend on the projection (which can be an advantage)
  2. It cannot use any values from the projection (which can be a disadvantage)

1. It doesn't depend on the projection

An example where not depending on the projection is useful is if you want to calculate window functions on distinct values:

SELECT rating, row_number() OVER (ORDER BY rating) AS rnFROM filmGROUP BY rating

When run against the Sakila database, this yields:

rating   rn-----------G        1NC-17    2PG       3PG-13    4R        5

The same couldn't be achieved with DISTINCT easily:

SELECT DISTINCT rating, row_number() OVER (ORDER BY rating) AS rnFROM film

That query is "wrong" and yields something like:

rating   rn------------G        1G        2G        3...G        178NC-17    179NC-17    180...

This is not what we wanted. The DISTINCT operation "happens after" the projection, so we can no longer remove DISTINCT ratings because the window function was already calculated and projected. In order to use DISTINCT, we'd have to nest that part of the query:

SELECT rating, row_number() OVER (ORDER BY rating) AS rnFROM (  SELECT DISTINCT rating FROM film) f

Side-note: In this particular case, we could also use DENSE_RANK()

SELECT DISTINCT rating, dense_rank() OVER (ORDER BY rating) AS rnFROM film

2. It cannot use any values from the projection

One of SQL's drawbacks is its verbosity at times. For the same reason as what we've seen before (namely the logical order of operations), we cannot "easily" group by something we're projecting.

This is invalid SQL:

SELECT first_name || ' ' || last_name AS nameFROM customerGROUP BY name

This is valid (repeating the expression)

SELECT first_name || ' ' || last_name AS nameFROM customerGROUP BY first_name || ' ' || last_name

This is valid, too (nesting the expression)

SELECT nameFROM (  SELECT first_name || ' ' || last_name AS name  FROM customer) cGROUP BY name

I've written about this topic more in depth in a blog post