Which provides better performance one big join or multiple queries?

sql database database-design query-optimization

I agree with everyone who's said a single join will probably be more efficient, even with a lot of tables. It's also less development effort than doing the work in your application code. This assumes the tables are appropriately indexed, with an index on each foreign key column, and (of course) an index on each primary key column.

Your best bet is to try the easiest approach (the big join) first, and see how well it performs. If it performs well, then great - you're done. If it performs poorly, profile the query and look for missing indexes on your tables.

Your option #1 is not likely to perform well, due to the number of network round-trips (as anijhaw mentioned). This is sometimes called the "select N+1" problem - you do one SELECT to get the list of N applications, and then do N SELECTs in a loop to get the customers. This record-at-a-time looping is natural to application programmers; but SQL works much better when you operate on whole sets of data at once.

If option #2 is slow even with good indexing, you may want to look into caching. You can cache in the database (using a summary table or materialized/indexed view), in the application (if there is enough RAM), or in a dedicated caching server such as memcached. Of course, this depends on how up-to-date your query results need to be. If everything has to be fully up-to-date, then any cache would have to be updated whenever the underlying tables are updated - it gets complicated and becomes less useful.

This sounds like a reporting query though, and reporting often doesn't need to be real-time. So caching might be able to help you.

Depending on your DBMS, another thing to think about is the impact of this query on other queries hitting the same database. If your DBMS allows readers to block writers, then this query could prevent updates to the tables if it takes a long time to run. That would be bad. Oracle doesn't have this problem, and neither does SQL Server when run in "read committed snapshot" mode. I don't know about MySQL though.

sql database database-design query-optimization

If this customer_id is unique in your customer-table (and the other IDs are unique in the other tables), so your query only returns 1 row per Application, then doing a single SELECT is certainly more efficient.

Joining all the required customers in one query will be optimized, while using lots of single SELECTs can't.

EDIT
I tried this with Oracle PL/SQL with 50.000 applications and 50.000 matching customers.

Solution with selecting everything in one query took
0.172 s

Solution with selecting every customer in a single SELECT took
1.984 s

And this is most likely getting worse with other clients or when accessing over network.

sql database database-design query-optimization

Single join should be faster for two main reasons.

If you are querying over a network, then there is overhead in using number of queries instead of a single query.

A join would be optimized inside the DBMS using the query optimizer so will be faster than executing several queries.

CodeHunter

Which provides better performance one big join or multiple queries?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last