Why are relational set-based queries better than cursors?

sql language-agnostic cursor

The main reason that I'm aware of is that set-based operations can be optimised by the engine by running them across multiple threads. For example, think of a quicksort - you can separate the list you're sorting into multiple "chunks" and sort each separately in their own thread. SQL engines can do similar things with huge amounts of data in one set-based query.

When you perform cursor-based operations, the engine can only run sequentially and the operation has to be single threaded.

sql language-agnostic cursor

Set based queries are (usually) faster because:

They have more information for the query optimizer to optimize
They can batch reads from disk
There's less logging involved for rollbacks, transaction logs, etc.
Less locks are taken, which decreases overhead
Set based logic is the focus of RDBMSs, so they've been heavily optimized for it (often, at the expense of procedural performance)

Pulling data out to the middle tier to process it can be useful, though, because it removes the processing overhead off the DB server (which is the hardest thing to scale, and is normally doing other things as well). Also, you normally don't have the same overheads (or benefits) in the middle tier. Things like transactional logging, built-in locking and blocking, etc. - sometimes these are necessary and useful, other times they're just a waste of resources.

A simple cursor with procedural logic vs. set based example (T-SQL) that will assign an area code based on the telephone exchange:

--CursorDECLARE @phoneNumber char(7)DECLARE c CURSOR LOCAL FAST_FORWARD FOR   SELECT PhoneNumber FROM Customer WHERE AreaCode IS NULLOPEN cFETCH NEXT FROM c INTO @phoneNumberWHILE @@FETCH_STATUS = 0 BEGIN   DECLARE @exchange char(3), @areaCode char(3)   SELECT @exchange = LEFT(@phoneNumber, 3)   SELECT @areaCode = AreaCode    FROM AreaCode_Exchange    WHERE Exchange = @exchange   IF @areaCode IS NOT NULL BEGIN       UPDATE Customer SET AreaCode = @areaCode       WHERE CURRENT OF c   END   FETCH NEXT FROM c INTO @phoneNumberENDCLOSE cDEALLOCATE cEND--SetUPDATE Customer SET    AreaCode = AreaCode_Exchange.AreaCodeFROM CustomerJOIN AreaCode_Exchange ON    LEFT(Customer.PhoneNumber, 3) = AreaCode_Exchange.ExchangeWHERE    Customer.AreaCode IS NULL

sql language-agnostic cursor

In addition to the above "let the DBMS do the work" (which is a great solution), there are a couple other good reasons to leave the query in the DBMS:

It's (subjectively) easier to read. When looking at the code later, would you rather try and parse a complex stored procedure (or client-side code) with loops and things, or would you rather look at a concise SQL statement?
It avoids network round trips. Why shove all that data to the client and then shove more back? Why thrash the network if you don't need to?
It's wasteful. Your DBMS and app server(s) will need to buffer some/all of that data to work on it. If you don't have infinite memory you'll likely page out other data; why kick out possibly important things from memory to buffer a result set that is mostly useless?
Why wouldn't you? You bought (or are otherwise using) a highly reliable, very fast DBMS. Why wouldn't you use it?

CodeHunter

Why are relational set-based queries better than cursors?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last