LINQ Joins - Performance LINQ Joins - Performance sql sql

LINQ Joins - Performance

The methods on System.Linq.Enumerable are performed in the order they are issued. There is no query optimizer at play.

Many methods are very lazy, which allows you to not fully enumerate the source by putting .First or .Any or .Take at the end of the query. That is the easiest optimization to be had.

For System.Linq.Enumerable.Join specifically, the docs state that this is a hash join.

The default equality comparer, Default, is used to hash and compare keys.

So examples:

//hash join (n+m) Enumerable.Joinfrom a in theAsjoin b in theBs on a.prop equals b.prop//nestedloop join (n*m)  Enumerable.SelectManyfrom a in theAsfrom b in theBswhere a.prop == b.prop

Linq to SQL does not send join hints to the server. Thus the performance of a join using Linq to SQL will be identical to the performance of the same join sent "directly" to the server (i.e. using pure ADO or SQL Server Management Studio) without any hints specified.

Linq to SQL also doesn't allow you to use join hints (as far as I know). So if you want to force a specific type of join, you'll have to do it using a stored procedure or the Execute[Command|Query] method. But unless you specify a join type by writing INNER [HASH|LOOP|MERGE] JOIN, then SQL Server always picks the type of join it thinks will be most efficient - it doesn't matter where the query came from.

Other Linq query providers - such as Entity Framework and NHibernate Linq - will do exactly the same thing as Linq to SQL. None of these have any direct knowledge of how you've indexed your database and so none of them send join hints.

Linq to Objects is a little different - it will (almost?) always perform a "hash join" in SQL Server parlance. That is because it lacks the indexes necessary to do a merge join, and hash joins are usually more efficient than nested loops, unless the number of elements is very small. But determining the number of elements in an IEnumerable<T> might require a full iteration in the first place, so in most cases it's faster just to assume the worst and use a hashing algorithm.

LINQ itself does not chose algorithms of any kind, as LINQ, strictly speaking, is simply a way of expressing a query in SQL-like syntax that can map to function calls on either IEnumerable<T> or IQueryable<T>. LINQ is entirely a language feature and does not provide functionality, only another way of expressing existing function calls.

In the case of IQueryable<T>, it's entirely up to the provider (such as LINQ to SQL) to chose the best method of producing the results.

In the case of LINQ to Objects (using IEnumerable<T>), simple enumeration is what's used (roughly equivalent to nested loops) in all cases. There is no deep inspection (or even knowledge of) the underlying data types in order to optimize the query.