Kubernetes different container args depending on number of pods in replica set Kubernetes different container args depending on number of pods in replica set kubernetes kubernetes

Kubernetes different container args depending on number of pods in replica set


Before giving the kubernetes-specific answer, I wanted to point out that it seems like the problem is trying to push cluster-coordination down into the app, which is almost by definition harder than using a distributed system primitive designed for that task. For example, if every new worker identifies themselves in etcd, then they can watch keys to detect changes, meaning no one needs to destroy a running application just to update its list of peers, their contact information, their capacity, current workload, whatever interesting information you would enjoy having while building a distributed worker system.

But, on with the show:


If you want stable identifiers, then StatefulSets is the modern answer to that. Whether that is an exact fit for your situation depends on whether (for your problem domain) id:0 being "rebooted" still counts as id:0 or the fact that it has stopped and started now disqualifies it from being id:0.

The running list of cluster size is tricky. If you are willing to be flexible in the launch mechanism, then you can have a pre-launch binary populate the environment right before spawning the actual worker (that example is for reading from etcd directly, but the same principle holds for interacting with the kubernetes API, then launching).

You could do that same trick in a more static manner by having an initContainer write the current state of affairs to a file, which the app would then read in. Or, due to all Pod containers sharing networking, the app could contact a "sidecar" container on localhost to obtain that information via an API.

So far so good, except for the

on size changes for all workers to be killed and new one spawned

The best answer I have for that requirement is that if the app must know its peers at launch time, then I am pretty sure you have left the realm of "scale $foo --replicas=5" and entered into the "destroy the peers and start all afresh" realm, with kubectl delete pods -l some-label=of-my-pods; which is, thankfully, what updateStrategy: type: OnDelete does, when combined with the delete pods command.


In the end, I've tried something different. I've used kubernetes API to get the number of running pods with the same label. This is python code utilizing kubernetes python client.

import socketfrom kubernetes import clientfrom kubernetes import configconfig.load_incluster_config()v1 = client.CoreV1Api()with open(    '/var/run/secrets/kubernetes.io/serviceaccount/namespace',    'r') as f:    namespace = f.readline()workers = []for pod in v1.list_namespaced_pod(    namespace,    watch=False,    label_selector="app=worker").items:    workers.append(pod.metadata.name)workers.sort()num_workers = len(workers)worker_id = workers.index(socket.gethostname())