SQL Server "<>" operator is very slow compared to "=" on table with a few million rows SQL Server "<>" operator is very slow compared to "=" on table with a few million rows sql sql

SQL Server "<>" operator is very slow compared to "=" on table with a few million rows


Because = reduces the join operation to one single matching row from each table (presuming those docids are unique).

Think of it this way- you've got a dance with 5 boys and 5 girls:

Adam      AliceBob       BettyCharly    CathyDick      DebEvan      Elly

You pair them up by first letter. So

Adam->AliceBob->Bettyetc...

One single pairing

But if you pair them up by "First letters do NOT match", you end up with:

Adam->BettyAdam->CathyAdam->DebAdam->EllyBob->Aliceetc...

you've MASSIVELY increased the number of pairings. This is why your <> query is taking so long. You're essentially trying to fetch m x n rows, rather than just min(m,n). With this data, you end up with 25 rows, rather than 5. For your specified table sizes, you're working with 77,000 * 2,700,000 = 207.9 billion rows, minus 77,000 where the two ids match up, for a total of 207,899,923,000 rows in the joined data set.


given your query requirements, try a left join and look for null right-side records:

SELECT DISTINCT logs.DOCIDFROM logsLEFT JOIN forms ON logs.DOCID = forms.DOCIDWHERE forms.DOCID IS NULL


Two reasons:

  • queries for equivalence can generally use indexes (if available), while query for nonequivalence cannot

  • <> returns so much more data.

Your query with <> is bogus. What should it return?


This is totally dependant on the distribution of values in the table. If the column you are searching, for example had the same value (= forms.DOCID) for 99.99 % of the rows and only one row with a different value, you would see exactly the opposite behavior.