TCP/IP - Solving the C10K with the thread per client approach TCP/IP - Solving the C10K with the thread per client approach multithreading multithreading

TCP/IP - Solving the C10K with the thread per client approach


Absolutely. A standard server can handle more than 10K concurrent connections using the model with one thread per connection. I have build such an application, and five years ago, it was running with more than 50K concurrent connections per process on a standard Linux server. Nowadays, it should be possible to run the same application with more than 250K concurrent connections on current hardware.

There are only a few things to keep in mind:

  • Reuse threads by using a thread pool. There is no need to kill threads if they are not used, because the resource usage should be optimized for peak loads.
  • Stack size: By default each Linux thread reserves 8 MB for its stack. That sums up to 80 GB for 10K threads. You should set the default stack size to some value between 64k and 512k, which isn't a problem, because most applications don't require deeper call stacks.
  • If the connections are short-lived, optimize for new connections by creating several sockets on the same endpoint with the option SO_REUSEPORT.
  • Increase the user limits: open files (default 1.024), max user processes
  • Increase system limits, e.g. /proc/sys/kernel/pid_max (default 32K), /proc/sys/kernel/threads-max, and /proc/sys/vm/max_map_count (default 65K).

The application mentioned above was initially designed to handle only 2K concurrent connections. However, with the growth in use, we didn't have to make significant changes to the code in order to scale up to 50K connections.


The usual approaches for servers are either: (a) thread per connection (often with a thread pool), or (b) single threaded with asynchronous IO (often with epoll or kqueue). My thinking is that some elements of these approaches can, and often should, be combined to use asynchronous IO (with epoll or kqueue) and then hand off the connection request to a thread pool to process. This approach would combine the efficient dispatch of asynchronous IO with the parallelism provided by the thread pool.

I have written such a server for fun (in C++) that uses epoll on Linux and kqueue on FreeBSD and OSX along with a thread pool. I just need to run it through its paces for heavy testing, do some code cleanup, and then toss it out on github (hopefully soon).