sql group by versus distinct

mysql sql-server performance group-by distinct

GROUP BY maps groups of rows to one row, per distinct value in specific columns, which don't even necessarily have to be in the select-list.

SELECT b, c, d FROM table1 GROUP BY a;

This query is legal SQL (correction: only in MySQL; actually it's not standard SQL and not supported by other brands). MySQL accepts it, and it trusts that you know what you're doing, selecting b, c, and d in an unambiguous way because they're functional dependencies of a.

However, Microsoft SQL Server and other brands don't allow this query, because it can't determine the functional dependencies easily. edit: Instead, standard SQL requires you to follow the Single-Value Rule, i.e. every column in the select-list must either be named in the GROUP BY clause or else be an argument to a set function.

Whereas DISTINCT always looks at all columns in the select-list, and only those columns. It's a common misconception that DISTINCT allows you to specify the columns:

SELECT DISTINCT(a), b, c FROM table1;

Despite the parentheses making DISTINCT look like function call, it is not. It's a query option and a distinct value in any of the three fields of the select-list will lead to a distinct row in the query result. One of the expressions in this select-list has parentheses around it, but this won't affect the result.

mysql sql-server performance group-by distinct

A little (VERY little) empirical data from MS SQL Server, on a couple of random tables from our DB.

For the pattern:

SELECT col1, col2 FROM table GROUP BY col1, col2

and

SELECT DISTINCT col1, col2 FROM table

When there's no covering index for the query, both ways produced the following query plan:

|--Sort(DISTINCT ORDER BY:([table].[col1] ASC, [table].[col2] ASC))   |--Clustered Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]))

and when there was a covering index, both produced:

|--Stream Aggregate(GROUP BY:([table].[col1], [table].[col2]))   |--Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]), ORDERED FORWARD)

so from that very small sample SQL Server certainly treats both the same.

mysql sql-server performance group-by distinct

In MySQL I've found using a GROUP BY is often better in performance than DISTINCT.

Doing an "EXPLAIN SELECT DISTINCT" shows "Using where; Using temporary " MySQL will create a temporary table.

vs a "EXPLAIN SELECT a,b, c from T1, T2 where T2.A=T1.A GROUP BY a" just shows "Using where"

CodeHunter

sql group by versus distinct

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last