Group by is case sensitive in T-SQL even though db and server collations are CI
This is more of extended comment that real answer.
I believe this issue is coming from how SQL Server is attempting to evaluate case
statement expression.
To prove that server is case insensetive you can run the following two statements
SELECT CASE WHEN 'Bertrand' = 'bertrand' THEN 'true' ELSE 'false' end
-
DECLARE @base TABLE(NAME VARCHAR(1),value INT)INSERT INTO @base Values('a',0),('b',0),('B',0)SELECT * FROM @baseSELECT name, COUNT(value) AS CntFROM @baseGROUP BY NAME
results:
as you can see here even though letter in second row is lower case and in third row is upper case, group by clause ignores the case. Looking at execution plan there are two expression for
Expr 1007 COUNT([value]) Expr 1004 CONVERT_IMPLICIT(int,[Expr1007],0)
now when we change it to case
SELECT CASE WHEN name = 'a' THEN 'adam' ELSE 'bertrand' END AS name, COUNT(value) AS CntFROM @baseGROUP BY CASE WHEN name = 'a' THEN 'adam' ELSE 'bertrand' END
execution plan shows 3 expressions. 2 from above and new one
Expr 1004 CASE WHEN [NAME]='a' THEN 'adam' ELSE 'bertrand' END
so at this point aggregate function is no longer evaluating value of the column name
but now it evaluating value of the expression.
What i think is happening is, could be incorrect. When SQL server converts both CASE
statement in SELECT
and GROUP BY
clause to a expression it comes up with different expression value. In this case you might as well do 'bertrand'
in select
and 'charlie'
in group by
clause because if CASE
expression is not 100% match between select and group by clause SQL Server will consider them as different Expr
aka (columns) that no longer match.
Update:
To take this one step further, the following statement will also fail.
SELECT CASE WHEN name = 'a' THEN 'adam' ELSE UPPER('bertrand') END AS name ,COUNT(value) AS CntFROM @baseGROUP BY CASE WHEN name = 'a' THEN 'adam' ELSE UPPER('Bertrand') END
Even wrapping the different case strings in UPPER()
function, SQL Server is still unable to process it.
The problem is the case when statement in select and group by should be same, but you can write your query as below
select name, count(value) from( SELECT CASE name WHEN 'a' THEN 'adam' ELSE 'bertrand' END name, value FROM ( SELECT 'a' AS name,1 AS value UNION SELECT 'b',1 UNION SELECT 'b',2 )a )tGROUP BY name
This is not about case sensivity, take a look at this sample
This worksSELECT CASE name WHEN 'a' THEN 'adam' when 'b' then 'bertrand' end name, COUNT(value) FROM(SELECT 'a' AS name,1 AS valueUNIONSELECT 'b',1UNIONSELECT 'b',2)aGROUP BY CASE name WHEN 'a' THEN 'adam' when 'b' then 'bertrand' ENDThis does not workSELECT CASE name when 'b' then 'bertrand' WHEN 'a' THEN 'adam' end name, COUNT(value) FROM(SELECT 'a' AS name,1 AS valueUNIONSELECT 'b',1UNIONSELECT 'b',2)aGROUP BY CASE name WHEN 'a' THEN 'adam' when 'b' then 'bertrand' END
While the case of 'adam' and 'bertrand' are the same.
You have found something that is genuinely weird, but I think the problem is that you are using the case statement at all in the group by statement. It should be:
SELECT CASE name WHEN 'a' THEN 'adam' ELSE 'bertrand' END AS name FROM(SELECT 'a' AS name,1 AS valueUNIONSELECT 'b',1UNIONSELECT 'b',2) aGROUP BY name
The group by should apply to the entire table and not individual rows. I could be missing some reason to do this, but I don't think it makes sense to conditionally group by a value.
I am more surprised that the first one works at all than that the second one doesn't work. Comparing 'a' = 'A' is subtly different from comparing a column to another column. SQL server doesn't seem to use the collation settings in this check to see if the column is in the group by. The error message you receive from your second query is saying 'this column in the select is not the same as the column in the group by' and not 'these values are not equal'.