Counting DISTINCT over multiple columns

sql sql-server performance tsql query-optimization

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

sql sql-server performance tsql query-optimization

Edit: Altered from the less-than-reliable checksum-only queryI've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )FROM DocumentOutPutItems

sql sql-server performance tsql query-optimization

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp  2  order by deptno, job  3  /    DEPTNO JOB---------- ---------        10 CLERK        10 MANAGER        10 PRESIDENT        20 ANALYST        20 CLERK        20 MANAGER        30 CLERK        30 MANAGER        30 SALESMAN9 rows selected.SQL> select count(*) from (  2  select distinct deptno, job from emp  3  )  4  /  COUNT(*)----------         9SQL>

edit

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp  2  /COUNT(DISTINCTCONCAT(DEPTNO,JOB))---------------------------------                                9SQL>

edit 2

Given the following data the concatenating solution provided above will miscount:

col1  col2----  ----A     AAAA    A

So we to include a separator...

select col1 + '*' + col2 from t23/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

CodeHunter

Counting DISTINCT over multiple columns

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last