Should we use multiple acceptor sockets to accept a large number of connections? Should we use multiple acceptor sockets to accept a large number of connections? multithreading multithreading

Should we use multiple acceptor sockets to accept a large number of connections?


Having had to handle such an occasion in production, here's a good way to approach this problem:

First, setup a single thread to handle all incoming connections. Modify the affinity map so that this thread has a dedicated core that no other threads in your application (or even your entire system) will try to access. You can also modify your boot scripts so that certain cores are never automatically assigned to an execution unit unless that specific core is explicitly requested (i.e. isolcpus kernel boot parameters).

Mark that core as un-used, and then explicitly request it in your code for the "listen to socket" thread via cpuset.

Next, setup a queue (ideally, a priority queue) that prioritizes write operations (i.e. "the second readers-writers problem). Now, setup however many worker threads as you see reasonable.

At this point, the goal of the "incoming connections" thread should be to:

  • accept() incoming connections.
  • Pass these connection file descriptors (FDs) off to your writer-prioritized queue structure as quickly as possible.
  • Go back to its accept() state as quickly as possible.

This will allow you to delegate incoming connections as quickly as possible. Your worker threads can grab items from the shared queue as they arrive. It might also be worth having a second, high-priority thread that grabs data from this queue, and moves it to a secondary queue, saving the "listen to socket" thread from having to spend extra cycles delegating client FDs.

This would also prevent the "listen to socket" thread and the worker threads from ever having to access the same queue concurrently, which would save you from worst-case scenarios like a slow worker thread locking the queue when the "listen to socket" thread wants to drop data in it. i.e.

Incoming client connections || || Listener thread - accept() connection. \/Listener/Helper queue || || Helper thread \/Shared Worker queue || || Worker thread #n \/Worker-specific memory space. read() from client.

As for your other two proposed options:

Use one acceptor socket shared between many threads, and each thread accept connections and processes it.

Messy. The threads will have to somehow take turns issuing the accept() call, and there won't be any benefit to doing this. You'll also have some additional sequencing logic to handle which thread's "turn" is up.

Use many acceptor sockets which listen the same ip:port, 1 individual acceptor socket in each thread, and the thread that receives the connection then processes it (recv/send)

Not the most portable option. I'd avoid it. Also, you'll potentially need to make your server process use multi-process (i.e. fork()) as opposed to multi-threaded, depending on OS, kernel version, etc.


Assuming you have two 10Gbps network connection and assuming a 500byte average frame size (which is very conservative for a server without interactive use), you'll have around 2Mpackets per second per network card (I don't believe you have more than this) and this means processing 4 packets per microsec. This is a very slow latency for a cpu like the one described in your configuration. On these premises, I's ensure that your bottleneck will be in the network (and the switches you connect to) than in the spinlock on each socket (it takes some cpu cycles to resolve on a spinlock, and this is far beyond the limit imposed by the network). Either, I'd dedicate a thread or two (one for reading and other for writing) maximum on each network card, and don't think much more in the socket locking features, anyway. Most probable is your bottleneck is in the application software you have in the backend of this configuration.

Even in the case you run into trouble, perhaps it would be better to do some modifications to the kernel sofware than adding up more and more processors or thinking on distributing the spinlocks into different sockets. Or even better, to add more network cards to aleviate the bottleneck.


Use many acceptor sockets which listen the same ip:port, 1 individual acceptor socket in each thread, and the thread that receives the connection then processes it (recv/send)

This is impossible in TCP. Forget it.

Do what everybody else does. One accepting thread, which starts a new thread per accepted socket, or despatches them to a thread pool.