How to work a job queue with kubernetes with scaling How to work a job queue with kubernetes with scaling kubernetes kubernetes

How to work a job queue with kubernetes with scaling


This may be a "goofy/hacky" answer, but it's simple, robust, and I've been using it in a production system for months now.

I have a similar system where I have a queue that sometimes is emptied out and sometimes gets slammed. I wrote my queue processor similarly, it handles one message in the queue at a time and terminates if the queue is empty. It is set up to run in a Kubernetes job.

The trick is this: I created a CronJob to regularly start one single new instance of the job, and the job allows infinite parallelism. If the queue is empty, it immediately terminates ("scales down"). If the queue is slammed and the last job hadn't finished yet, another instance starts ("scales up").

No need to futz with querying the queue and scaling a statefulset or anything, and no resources are consumed if the queue is sitting empty. You may have to adjust the CronJob interval to fine tune how fast it reacts to the queue filling up, but it should react pretty well.


This is a common pattern, and there are several ways to architect a solution.

A common solution is to have an app with a set of workers always polling your queue (this could be your python script but you need to make it a service) and generally you'll want to use a Kubernetes Deployment possibly with an Horizontal Pod Autoscaler based on some metrics for your queue or CPU.

In your case, you'll want to make your script a daemon and poll the queue if there are any items (I assume you are already handling race conditions with parallelism). Then deploy this daemon using a Kubernetes deployment and then you can scale up and down based metrics or schedule.

There are already job schedulers out there for many different languages too. One that is very popular is Airflow that it already has the ability to have 'workers', but this may be overkill for a single python script.