Eventloop has high ksoftirqd load; nginx does not but does same system-calls. Why? Eventloop has high ksoftirqd load; nginx does not but does same system-calls. Why? linux linux

Eventloop has high ksoftirqd load; nginx does not but does same system-calls. Why?


The short answer:
Sometimes being faster means being slower.


The long answer:

In a single-request setting (one request at a time) epoll_wait always blocks and forces a context-switch. The client (ab in this case) has to receive and process the response + send the next request before the server is woke up again and epoll_wait returns the pending socket. The speed in which the server generates the response does not change this behaviour since the client always waits for the server to respond before sending the next request.

In a multi-request setting it depends on the performance of the server if epoll_wait forces a context-switch or not. If the server-application is slower the chance of there being the next request already waiting is higher; in which case no context switch is necessary since epoll_wait can return immediately. A context-switch is expensive and may take longer than it takes the next request to arrive wasting time.

I realised this when I eliminated the chance of epoll_wait making a context-switch by setting the timeout to zero. This forces epoll_wait to return immediately even if there is no pending request (so it never waits and therefore there is no forced context-switch). In this case the test-application way outperforms nginx even in a multi-request setting.

I further confirmed my theory by:

  • removing work from nginx (disabling the access-log) which made it slower
  • adding work to the test-application (spin-loop) which made it faster

in a multi-request setting.


For this answer I retook all measurements. All tests were ran on an updated ubuntu-box with 8 cores (so more than 5) with the server and the client running on this machine at the same time. The test-application is unchanged.

  • gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
  • nginx version: nginx/1.14.0 (Ubuntu)

I wrote a small script to come up with the request/second number below that runs ab with 60.000 requests 100 times picking the fastest outcome.

maxRequests=0for ((i = 0; i < 100; ++i)); do    requests=$(ab -n 60000 -c (1 or 4) http://127.0.0.1:8081/index.html 2>&1 | grep "Requests per second" | cut -d" " -f7)    requests=${requests%.*}    maxRequests=$(( maxRequests > requests ? maxRequests : requests ))doneecho ${maxRequests}



Here again the base-values with the same settings like in the question.

Requests per second: 13507 (test-application, 1c)Requests per second: 27648 (test-application, 4c)Requests per second: 11755 (nginx, 1c)Requests per second: 31446 (nginx, 4c)

To make nginx faster (and therefore slower with 4c) I disabled the access-log.

Requests per second: 12028 (nginx, 1c, no access-log)Requests per second: 28976 (nginx, 4c, no access-log)

Here is the result if the test-application never sleeps by settings the epool_wait - time to zero.
Whats interesting is the difference in the 1c setting showing how much it costs to wake up the test-application when a new request arrives.

Requests per second: 20079 (test-application, 1c, spinning)Requests per second: 34522 (test-application, 4c, spinning)

For these last measurements I added some work to the test-application making it artificially slower to increase the chance of preventing a context-switch. I added the code below to the end of the inner for-loop (after the close) and varied the initial value of j to get to the different results

uint8_t j = 50;while (--j != 0) {    uint8_t i = 0;    while (--i != 0)        asm("");}
Requests per second: 12910 (test-application, 1c, j=50)Requests per second: 12126 (test-application, 1c, j=100)Requests per second: 11634 (test-application, 1c, j=150)Requests per second: 11020 (test-application, 1c, j=200)Requests per second: 10235 (test-application, 1c, j=250)Requests per second: 27447 (test-application, 4c, j=25)Requests per second: 29464 (test-application, 4c, j=50)Requests per second: 31334 (test-application, 4c, j=75)Requests per second: 32079 (test-application, 4c, j=100)Requests per second: 33510 (test-application, 4c, j=125)Requests per second: 34241 (test-application, 4c, j=150)Requests per second: 34189 (test-application, 4c, j=175)Requests per second: 33855 (test-application, 4c, j=200)Requests per second: 33328 (test-application, 4c, j=250)