Gunicorn does not repondes more than 6 requests at a time Gunicorn does not repondes more than 6 requests at a time kubernetes kubernetes

Gunicorn does not repondes more than 6 requests at a time


What you describe appears to be an indicator that you running the Gunicorn server with the sync worker class serving an I/O bound application. Can you share your Gunicorn configuration?

Is it possible that Google's platform has some kind of autoscaling feature (I'm not really familiar with their service) that's being triggered while your Kubernetes configuration does not?

Generically speaking increasing the number cores for a single instance will only help if you also increase the number of workers spawned to attend incoming requests. Please see the Gunicorn's design documentation with a special emphasis on the worker types section (and why sync workers are suboptimal for I/O bound applications) - its a good read and provides a more detailed explanation about this problem.

Just for fun, here's a small exercise to compare the two approaches:

import timedef app(env, start_response):    time.sleep(1) # takes 1 second to process the request    start_response('200 OK', [('Content-Type', 'text/plain')])    return [b'Hello World']

Running Gunicorn with 4 sync workers: gunicorn --bind '127.0.0.1:9001' --workers 4 --worker-class sync --chdir app app:app

Let's trigger 8 request at the same time: ab -n 8 -c 8 "http://localhost:9001/"

This is ApacheBench, Version 2.3 <$Revision: 1706008 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking localhost (be patient).....doneServer Software:        gunicorn/19.8.1Server Hostname:        localhostServer Port:            9001Document Path:          /Document Length:        11 bytesConcurrency Level:      8Time taken for tests:   2.007 secondsComplete requests:      8Failed requests:        0Total transferred:      1096 bytesHTML transferred:       88 bytesRequests per second:    3.99 [#/sec] (mean)Time per request:       2006.938 [ms] (mean)Time per request:       250.867 [ms] (mean, across all concurrent requests)Transfer rate:          0.53 [Kbytes/sec] receivedConnection Times (ms)              min  mean[+/-sd] median   maxConnect:        0    1   0.2      1       1Processing:  1003 1504 535.7   2005    2005Waiting:     1002 1504 535.8   2005    2005Total:       1003 1505 535.8   2006    2006Percentage of the requests served within a certain time (ms)  50%   2006  66%   2006  75%   2006  80%   2006  90%   2006  95%   2006  98%   2006  99%   2006 100%   2006 (longest request)

Around 2 seconds to complete the test. That's the behavior you got on your tests - the 4 first requests took kept your workers busy, the second batch was queued until the first batch was processed.


Same test, but let's tell Gunicorn to use an async worker: unicorn --bind '127.0.0.1:9001' --workers 4 --worker-class gevent --chdir app app:app

Same test as above: ab -n 8 -c 8 "http://localhost:9001/"

This is ApacheBench, Version 2.3 <$Revision: 1706008 $>Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking localhost (be patient).....doneServer Software:        gunicorn/19.8.1Server Hostname:        localhostServer Port:            9001Document Path:          /Document Length:        11 bytesConcurrency Level:      8Time taken for tests:   1.005 secondsComplete requests:      8Failed requests:        0Total transferred:      1096 bytesHTML transferred:       88 bytesRequests per second:    7.96 [#/sec] (mean)Time per request:       1005.463 [ms] (mean)Time per request:       125.683 [ms] (mean, across all concurrent requests)Transfer rate:          1.06 [Kbytes/sec] receivedConnection Times (ms)              min  mean[+/-sd] median   maxConnect:        0    1   0.4      1       2Processing:  1002 1003   0.6   1003    1004Waiting:     1001 1003   0.9   1003    1004Total:       1002 1004   0.9   1004    1005Percentage of the requests served within a certain time (ms)  50%   1004  66%   1005  75%   1005  80%   1005  90%   1005  95%   1005  98%   1005  99%   1005 100%   1005 (longest request)

We actually double the application's throughput here - it only took ~1s to reply to all the requests.

To understand what happened Gevent has a great tutorial about its architecture and this article has a more in-depth explanation about co-routines.


I apologize in advance if was way off on the actual cause of your problem (I do believe that some additional information is lacking from your initial comment for anyone to have a conclusive answer). If not to you, I hope this'll helpful to someone else. :)

Also do notice that I've oversimplified things a lot (my example was a simple proof of concept), tweaking an HTTP server configuration is mostly a trial and error exercise - it's all dependent on the type of workload the application has and the hardware it sits on.