Is there any difference between GROUP BY and DISTINCT

sql group-by distinct

MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."

However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.

A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?

(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct and screw => get list of unique values in a table column)

sql group-by distinct

GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. On the other hand DISTINCT just removes duplicates.

For example, if you have a bunch of purchase records, and you want to know how much was spent by each department, you might do something like:

SELECT department, SUM(amount) FROM purchases GROUP BY department

This will give you one row per department, containing the department name and the sum of all of the amount values in all rows for that department.

sql group-by distinct

What's the difference from a mere duplicate removal functionality point of view

Apart from the fact that unlike DISTINCT, GROUP BY allows for aggregating data per group (which has been mentioned by many other answers), the most important difference in my opinion is the fact that the two operations "happen" at two very different steps in the logical order of operations that are executed in a SELECT statement.

Here are the most important operations:

FROM (including JOIN, APPLY, etc.)
WHERE
GROUP BY (can remove duplicates)
Aggregations
HAVING
Window functions
SELECT
DISTINCT (can remove duplicates)
UNION, INTERSECT, EXCEPT (can remove duplicates)
ORDER BY
OFFSET
LIMIT

As you can see, the logical order of each operation influences what can be done with it and how it influences subsequent operations. In particular, the fact that the GROUP BY operation "happens before" the SELECT operation (the projection) means that:

It doesn't depend on the projection (which can be an advantage)
It cannot use any values from the projection (which can be a disadvantage)

1. It doesn't depend on the projection

An example where not depending on the projection is useful is if you want to calculate window functions on distinct values:

SELECT rating, row_number() OVER (ORDER BY rating) AS rnFROM filmGROUP BY rating

When run against the Sakila database, this yields:

rating   rn-----------G        1NC-17    2PG       3PG-13    4R        5

The same couldn't be achieved with DISTINCT easily:

SELECT DISTINCT rating, row_number() OVER (ORDER BY rating) AS rnFROM film

That query is "wrong" and yields something like:

rating   rn------------G        1G        2G        3...G        178NC-17    179NC-17    180...

This is not what we wanted. The DISTINCT operation "happens after" the projection, so we can no longer remove DISTINCT ratings because the window function was already calculated and projected. In order to use DISTINCT, we'd have to nest that part of the query:

SELECT rating, row_number() OVER (ORDER BY rating) AS rnFROM (  SELECT DISTINCT rating FROM film) f

Side-note: In this particular case, we could also use DENSE_RANK()

SELECT DISTINCT rating, dense_rank() OVER (ORDER BY rating) AS rnFROM film

2. It cannot use any values from the projection

One of SQL's drawbacks is its verbosity at times. For the same reason as what we've seen before (namely the logical order of operations), we cannot "easily" group by something we're projecting.

This is invalid SQL:

SELECT first_name || ' ' || last_name AS nameFROM customerGROUP BY name

This is valid (repeating the expression)

SELECT first_name || ' ' || last_name AS nameFROM customerGROUP BY first_name || ' ' || last_name

This is valid, too (nesting the expression)

SELECT nameFROM (  SELECT first_name || ' ' || last_name AS name  FROM customer) cGROUP BY name

I've written about this topic more in depth in a blog post

CodeHunter

Is there any difference between GROUP BY and DISTINCT

What's the difference from a mere duplicate removal functionality point of view

1. It doesn't depend on the projection

2. It cannot use any values from the projection

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last