Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack django django

Troubleshooting Site Slowness on a Nginx + Gunicorn + Django Stack


That's a lot of sites to host on a server with only 1GB of RAM. You're at nearly 100% memory utilization, and the numbers you have are probably "standby" numbers. The RAM usage of each process can and will balloon in the process of serving requests. Right off the bat, you need to add more RAM to this instance and, better, move some of the sites off onto another server.

As to your questions:

  1. Where'd you get the idea that sites become "inactive" and Gunicorn, then, has to load the site again? That's rubbish. As long as the Gunicorn process is running (i.e. not terminated manually or by an error on the site) it remains fully initialized and ready to go, whether it's been an hour or a month.

  2. You're hacking at the leaves here, leaving the root untouched. There's nothing out of the ordinary with the memory usage of each Gunicorn process. It needs RAM to run. Your problem is trying to run too much on a severly underpowered server. No optimization is going to save you here. You need more RAM or more servers. Probably both.

  3. No need. Again, the problem is already identified. Pretty clearly in fact by the numbers you posted.

  4. There's no way to reliably know which processes are getting swapped. It changes every second and depends on which are actively running and need more RAM and which are inactive or simply not as active. When your server is this strapped for resources, it's spending half it's time just figuring out which process to juggle next, especially if they're all active and vying for resources.

  5. Yes. Gunicorn recommends 2*cores+1. So on a dual-core system, that's 5; on a quad-core, 9. However, there's no way you could run even 5 workers for each of these sites on this one system. You can't even run 1 worker for each reliably.

  6. It depends on the "things". But, when multiple sites are hosted on the same server, those servers are beasts spec-wise. On a small, probably VPS instance like you have, especially with only 1GB of RAM, one site is pretty much your limit. Two, maybe.


1) Not sure what you mean by inactive? As in, disabled by nginx? Or just too slow to work?

2 and 3) django-debug-toolbar and django-debug-logging will be a good place to start. If this doesn't help, it's time to move to server-level profiling to see which processes are causing the problem.

4) Use top: How to find out which processes are swapping in linux?

5) Yes - benchmarking. Pick a benchmarking tool (e.g. apachebench) and run tests against your current configuration. Tweak something. Run the tests again. Repeat until your performance problems are gone! For best results, use traffic which is similar to your live traffic (in terms of URL distribution, GET/POST, etc).

6) Yes, at both the nginx and app levels. You will probably get most benefit by profiling each site and improving its memory usage (see 2).


Regarding:

Regarding your answer to 5, I believe what Gunicorn recommends is overkill.

I recently performed some ad-hoc testing with the number of workers and found that, assuming you have enough RAM, that that 2*cores+1 rule of thumb is pretty accurate. I found that requests/sec increased almost linearly until I got close to that number, then dropped off as the OS started to thrash.

Since results depend greatly on workload, try different values and see where your performance peaks.