Is there a way to multithread a SqlDataReader? Is there a way to multithread a SqlDataReader? multithreading multithreading

Is there a way to multithread a SqlDataReader?


Set up a producer/consumer queue, with one producer process to pull from the reader and queue records as fast as it can, but do no "processing". Then some other number of processes (how many you want depends on your system) to dequeue and process each queued record.


You shouldn't read that many rows on the client.

That being said, you can partition your query into multiple queries and execute them in parallel. That means launch multiple SqlCommands in separate threads and have them each churn a partition of the result. The A+ question is how to partition the result, and this depends largely o your data and your query:

  1. You can use a range of keys (eg. ID betweem 1 and 10000, ID between 10001 and 20000 etc)
  2. You can use an attribute (eg. RecordTypeID IN (1,2), RecordTypeID IN (3,4) etc)
  3. You can use a synthetic range (ie. ROW_NUMBER() BETWEEN 1 and 1000 etC), but this is very problematic to pull of right
  4. You can use a hash (eg. BINARY_CHECKSUM(*)%10 == 0, BINARY_CHECKSUM(*)%10==1 etc)

You just have to be very careful that the partition queries do no overlap and block during execution (ie. scan the same records and acquire X locks), thus serializing each other.


Is it a simple ranged query like WHERE Id between 1 and 500000? If so you can just kick off N queries that each return 1/N of the range. But it helps to know where you are bottlenecked with the single threaded approach. If you are doing contiguous reads from one disk spindle to fulfill the query then you should probably stick with a single thread. If it is partitioned across spindles by some range then you can intelligently tune your queries to maximize throughput from disk (i.e. read from each disk in parallel with separate queries). If you expect all of the rows to be in memory then you can parallelize at will. But if the query is more complex then you may not be able to easily partition it without incurring a bunch of overhead. Most of the time the above options will not apply well and the producer/consumer that Joel mentioned will be the only place to parallelize. Depending on how much time you spend processing each row, this may be provide only trivial gains.