SQL: Inner joining two massive tables

sql sql-server sql-server-2008 query-optimization inner-join

I'm not an SQL tuning expert, but joining hundreds of millions of rows on a VARCHAR field doesn't sound like a good idea in any database system I know.

You could try adding an integer column to each table and computing a hash on the NAME field that should get the possible matches to a reasonable number before the engine has to look at the actual VARCHAR data.

sql sql-server sql-server-2008 query-optimization inner-join

For huge joins, sometimes explicitly choosing a loop join speeds things up:

SELECT EGM.Name, BioEntity.BioEntityId INTO AUXFROM EGM INNER LOOP JOIN BioEntity     ON EGM.name LIKE BioEntity.Name AND EGM.TypeId = BioEntity.TypeId

As always, posting your estimated execution plan could help us provide better answers.

EDIT: If both inputs are sorted (they should be, with the covering index), you can try a MERGE JOIN:

SELECT EGM.Name, BioEntity.BioEntityId INTO AUXFROM EGM INNER JOIN BioEntity     ON EGM.name LIKE BioEntity.Name AND EGM.TypeId = BioEntity.TypeIdOPTION (MERGE JOIN)

sql sql-server sql-server-2008 query-optimization inner-join

First, 100M-row joins are not at all unreasonable or uncommon.

However, I suspect the cause of the poor performance you're seeing may be related to the INTO clause. With that, you are not only doing a join, you are also writing the results to a new table. Your observation about the log file growing so huge is basically confirmation of this.

One thing to try: remove the INTO and see how it performs. If the performance is reasonable, then to address the slow write you should make sure that your DB log file is on a separate physical volume from the data. If it isn't, the disk heads will thrash (lots of seeks) as they read the data and write the log, and your perf will collapse (possibly to as little as 1/40th to 1/60th of what it could be otherwise).

CodeHunter

SQL: Inner joining two massive tables

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last