Docker/Kubernetes + Gunicorn/Celery - Multiple Workers vs Replicas? Docker/Kubernetes + Gunicorn/Celery - Multiple Workers vs Replicas? docker docker

Docker/Kubernetes + Gunicorn/Celery - Multiple Workers vs Replicas?


These technologies aren't as similar as they initially seem.They address different portions of the application stack and are actually complementary.

Gunicorn is for scaling web request concurrency, while celery should be thought of as a worker queue. We'll get to kubernetes soon.


Gunicorn

Web request concurrency is primarily limited by network I/O or "I/O bound". These types of tasks can be scaled using cooperative scheduling provided by threads. If you find request concurrency is limiting your application, increasing gunicorn worker threads may well be the place to start.


Celery

Heavy lifting tasks e.g. compress an image, run some ML algo, are "CPU bound" tasks. They can't benefit from threading as much as more CPUs. These tasks should be offloaded and parallelized by celery workers.


Kubernetes

Where Kubernetes comes in handy is by providing out-of-the-box horizontal scalability and fault tolerance.

Architecturally, I'd use two separate k8s deployments to represent the different scalablity concerns of your application.One deployment for the Django app and another for the celery workers.This allows you to independently scale request throughput vs. processing power.

I run celery workers pinned to a single core per container (-c 1) this vastly simplifies debugging and adheres to Docker's "one process per container" mantra. It also gives you the added benefit of predictability, as you can scale the processing power on a per-core basis by incrementing the replica count.

Scaling the Django app deployment is where you'll need to DYOR to find the best settings for your particular application.Again stick to using --workers 1 so there is a single process per container but you should experiment with --threads to find the best solution. Again leave horizontal scaling to Kubernetes by simply changing the replica count.

HTHIt's definitely something I had to wrap my head around when working on similar projects.


We run a Kubernetes kluster with Django and Celery, and implemented the first approach. As such some of my thoughts on this trade-off and why we choose for this approach.

In my opinion Kubernetes is all about horizontally scaling your replica's (called deployments). In that respect it makes most sense to keep your deployments as single use as possible, and increase the deployments (and pods if you run out) as demand increases. The LoadBalancer thus manages traffic to the Gunicorn deployments, and the Redis queue manages the tasks to the Celery workers. This ensures that the underlying docker containers are simple and small, and we can individually (and automagically) scale them as we see fit.

As for your thought on how many many workers/concurrency you need per deployment, that really depends on the underlying hardware you have your Kubernetes running on and requires experimentation to get right.

For example, we run our cluster on Amazon EC2 and experimented with different EC2 instance types and workers to balance performance and costs. The more CPU you have per instance, the less instances you need and the more workers you can deploy per instance. But we found out that deploying more smaller instances is in our case cheaper. We now deploy multiple m4.large instances with 3 workers per deployment.

interesting side note: we have had really bad performance of gunicorn in combination with the amazon load balancers, as such we switched to uwsgi with great performance increases. But the principles are the same.