multithread read from disk? multithread read from disk? multithreading multithreading

multithread read from disk?


Yes, it is possible. However:

Do all threads on the same processor use the same IO device to read from disk?

Yes. The read head on the disk. As an example, try copying two files in parallel as opposed to in series. It will take significantly longer in parallel, because the OS uses scheduling algorithms to make sure the IO rate is "fair," or equal between the two threads/processes. Because of this, the read head will jump back and forth between different parts of the disk, slowing the process down A LOT. The time to actually read the data is pretty small compared to the time to seek to it, and when you're reading two different parts of the disk at once, you spend most of the time seeking.

Note that all of this assumes you're using a hard disk. If you're using an SSD, it will not be slower in parallel, but it will not be faster either. Edit: according to comments parallel is actually faster for an SSD. With RAID the situation becomes more complicated, and (obviously) depends on what kind of RAID you're using.

This is what it looks like (I've unwrapped the circular disk into a rectangle because ascii circles are hard, and simplified the data layout to make it easier to read):

Assume the files are separated by some space on the platter like so:

|         |

A series read will look like (* indicates reading)

space ----->|        *|  t|        *|  i|        *|  m|        *|  e|        *|  ||       / |  ||     /   |  ||   /     |  V|  /      ||*        ||*        ||*        ||*        |

While a parallel read will look like

|       \ ||        *||       / ||     /   ||   /     ||  /      ||*        ||  \      ||    \    ||     \   ||       \ ||        *||       / ||     /   ||   /     ||  /      ||*        ||  \      ||    \    ||     \   ||       \ ||        *|

etc


If you're doing this on Windows you might want to look into the ReadFileScatter function. It will let you read multiple segments from a file in a single asynchronous call. This will allow the OS to better control the file IO bottle neck and hopefully optimizes the reads.

The matching write call on Windows would be WriteFileGather.

For UNIX you're looking at readv and writev to do the same thing.


As mentioned in the other answers a parallel read may be slower depending on the way the file is physically stored on disk. So if the head has to move a significant distance it can cause an actual slowdown. This being said there are however storage systems which can support multiple simultaneous reads and writes efficiently. The most simple one I can imagine is a SSD disk. I myself worked with magnificent storage systems from IBM which could perform simultaneous reads and writes with no slowdown.So let's assume you have such a file system and physical storage which will not slow down on parallel reads.

In that case parallel reads are very logical. In general there are two ways to achieve that:

  1. If you want to use the standard C/C++ library to perform the IO then the only option you have is to keep one open file handle (descriptor) per thread. This is because the file pointer (which points to where to read or write from in the file) is kept per handle. So if you try to read simultaneously from the same file handle you will not have any way of knowing what you are actually reading.
  2. Use platform specific API to perform asynchronous (OVERLAPPED) IO. On windows you use the WinAPI functions with what is called OVERLAPPED IO. On Unix/Linux you have posix AIO although I understand that it's use is discouraged although I didn't see any satisfactory explanation as to why that is the case.

I myself implemented the both fd/thread approach on both linux and windows and the OVERLAPPED approach on windows. Both work great.