Kubernetes workload scaling on multi-threaded code Kubernetes workload scaling on multi-threaded code kubernetes kubernetes

Kubernetes workload scaling on multi-threaded code


You'll typically use a Kubernetes Deployment object to deploy application code. That has a replicas: setting, which launches some number of identical disposable Pods. Each Pod has a container, and each pod will independently run the code block you quoted above.

The challenge here is distributing work across the Pods. If each Pod generates its own 50,000 work items, they'll all do the same work and things won't happen any faster. Just running your application in Kubernetes doesn't give you any prebuilt way to share thread pools or task queues between Pods.

A typical approach here is to use a job queue system; RabbitMQ is a popular open-source option. One part of the system generates the tasks and writes them into RabbitMQ. One or more workers reads jobs from the queue and runs them. You can set this up and demonstrate it to yourself without using container technology, then repackage it in Docker or Kubernetes just changing the RabbitMQ broker address at deploy time.

In this setup I'd probably have the worker run jobs serially, one at a time, with no threading. That will simplify the implementation of the worker. If you want to run more jobs in parallel, run more workers; in Kubernetes, increase the Deployment replica: count.


In Kubernetes, when we deploy containers as Pods we can include the resources.limits.cpu and resources.requests.cpu fields for each container in the Pod's manifest:

resources:  requests:    cpu: "1000m"  limits:    cpu: "2000m"

In the example above we have a request for 1 CPU and a limit for a maximum of 2 CPUs. This means the Pod will be scheduled to a worker node which can satisfy above resource requirements.

One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers and 1 hyperthread on bare-metal Intel processors.

We can vertically scale by increasing / decreasing the values for the requests and limits fields. Or we can horizontally scale by increasing / decreasing the number of replicas of the pod.

For more details about resource units in Kubernetes here