SQL Server "<>" operator is very slow compared to "=" on table with a few million rows
= reduces the join operation to one single matching row from each table (presuming those docids are unique).
Think of it this way- you've got a dance with 5 boys and 5 girls:
Adam AliceBob BettyCharly CathyDick DebEvan Elly
You pair them up by first letter. So
One single pairing
But if you pair them up by "First letters do NOT match", you end up with:
you've MASSIVELY increased the number of pairings. This is why your
<> query is taking so long. You're essentially trying to fetch
m x n rows, rather than just
min(m,n). With this data, you end up with 25 rows, rather than 5. For your specified table sizes, you're working with 77,000 * 2,700,000 = 207.9 billion rows, minus 77,000 where the two ids match up, for a total of 207,899,923,000 rows in the joined data set.
given your query requirements, try a left join and look for null right-side records:
SELECT DISTINCT logs.DOCIDFROM logsLEFT JOIN forms ON logs.DOCID = forms.DOCIDWHERE forms.DOCID IS NULL