How many threads to create and when? How many threads to create and when? multithreading multithreading

How many threads to create and when?


It is important to understand the purpose of using multiple threads on a server; many threads in a server serve to decrease latency rather than to increase speed. You don't make the cpu more faster by having more threads but you make it more likely a thread will always appear at within a given period to handle a request.

Having a bunch of threads which just move data in parallel is a rather inefficient shot-gun (Creating one thread per request naturally just fails completely). Using the thread pool pattern can be a more effective, focused approach to decreasing latency.

Now, in the thread pool, you want to have at least as many threads as you have CPUs/cores. You can have more than this but the extra threads will again only decrease latency and not increase speed.

Think the problem of organizing server threads as akin to organizing a line in a super market. Would you like to have a lot of cashiers who work more slowly or one cashier who works super fast? The problem with the fast cashier isn't speed but rather that one customer with a lot of groceries might still take up a lot of their time. The need for many threads comes from the possibility that a few request that will take a lot of time and block all your threads. By this reasoning, whether you benefit from many slower cashiers depends on whether your have the same number of groceries or wildly different numbers. Getting back to the basic model, what this means is that you have to play with your thread number to figure what is optimal given the particular characteristics of your traffic, looking at the time taken to process each request.


Classically the number of reasonable threads is depending on the number of execution units, the ratio of IO to computation and the available memory.

Number of Execution Units (XU)

That counts how many threads can be active at the same time. Depending on your computations that might or might not count stuff like hyperthreads -- mixed instruction workloads work better.

Ratio of IO to Computation (%IO)

If the threads never wait for IO but always compute (%IO = 0), using more threads than XUs only increase the overhead of memory pressure and context switching. If the threads always wait for IO and never compute (%IO = 1) then using a variant of poll() or select() might be a good idea.

For all other situations XU / %IO gives an approximation of how many threads are needed to fully use the available XUs.

Available Memory (Mem)

This is more of a upper limit. Each thread uses a certain amount of system resources (MemUse). Mem / MemUse gives you an approximation of how many threads can be supported by the system.

Other Factors

The performance of the whole system can still be constrained by other factors even if you can guess or (better) measure the numbers above. For example, there might be another service running on the system, which uses some of the XUs and memory. Another problem is general available IO bandwidth (IOCap). If you need less computing resources per transferred byte than your XUs provide, obviously you'll need to care less about using them completely and more about increasing IO throughput.

For more about this latter problem, see this Google Talk about the Roofline Model.


I'd say, try using just ONE thread; it makes programming much easier. Although you'll need to use something like libevent to multiplex the connections, you won't have any unexpected synchronisation issues.

Once you've got a working single-threaded implementation, you can do performance testing and make a decision on whether a multi-threaded one is necessary.

Even if a multithreaded implementation is necessary, it may be easier to break it into several processes instead of threads (i.e. not sharing address space; either fork() or exec multiple copies of the process from a parent) if they don't have a lot of shared data.

You could also consider using something like Python's "Twisted" to make implementation easier (this is what it's designed for).

Really there's probably not a good case for using threads over processes - but maybe there is in your case, it's difficult to say. It depends how much data you need to share between threads.