Is count(*) really expensive?

asp.net sql-server-2005 performance premature-optimization

In general, the cost of COUNT(*) cost is proportional to the number of records satisfying the query conditions plus the time required to prepare these records (which depends on the underlying query complexity).

In simple cases where you're dealing with a single table, there are often specific optimisations in place to make such an operation cheap. For example, doing COUNT(*) without WHERE conditions from a single MyISAM table in MySQL - this is instantaneous as it is stored in metadata.

For example, Let's consider two queries:

SELECT  COUNT(*)FROM    largeTableA a

Since every record satisfies the query, the COUNT(*) cost is proportional to the number of records in the table (i.e., proportional to what it returns) (Assuming it needs to visit the rows and there isnt a specific optimisation in place to handle it)

SELECT  COUNT(*)FROM    largeTableA aJOIN    largeTableB bON      a.id = b.id

In this case, the engine will most probably use HASH JOIN and the execution plan will be something like this:

Build a hash table on the smaller of the tables
Scan the larger table, looking up each records in a hash table
Count the matches as they go.

In this case, the COUNT(*) overhead (step 3) will be negligible and the query time will be completely defined by steps 1 and 2, that is building the hash table and looking it up. For such a query, the time will be O(a + b): it does not really depend on the number of matches.

However, if there are indexes on both a.id and b.id, the MERGE JOIN may be chosen and the COUNT(*) time will be proportional to the number of matches again, since an index seek will be performed after each match.

asp.net sql-server-2005 performance premature-optimization

You need to attach SQL Profiler or an app level profiler like L2SProf and look at the real query costs in your context before:

guessing what the problem is and trying to determine the likely benefits of a potential solution
allowing others to guess for you on da interwebs - there's lots of misinformation without citations about, including in this thread (but not in this post :P)

When you've done that, it'll be clear what the best approach is - i.e., whether the SELECT COUNT is dominating things or not, etc.

And having done that, you'll also know whether any changes you choose to do have had a positive or a negative impact.

asp.net sql-server-2005 performance premature-optimization

As others have said COUNT(*) always physically counts rows, so if you can do that once and cache the results, thats certainly preferable.

If you benchmark and determine that the cost is negligible, you don't (currently) have a problem.

If it turns out to be too expensive for your scenario you could make your pagination 'fuzzy' as in "Showing 1 to 500 of approx 30,000" by using

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('sometable') AND indid < 2

which will return an approximation of the number of rows (its approximate because its not updated until a CHECKPOINT).

CodeHunter

Is count(*) really expensive?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last