Avoiding kubernetes scheduler to run all pods in single node of kubernetes cluster Avoiding kubernetes scheduler to run all pods in single node of kubernetes cluster kubernetes kubernetes

Avoiding kubernetes scheduler to run all pods in single node of kubernetes cluster


Use podAntiAfinity

Reference: Kubernetes in Action Chapter 16. Advanced scheduling

The podAntiAfinity with requiredDuringSchedulingIgnoredDuringExecution can be used to prevent the same pod from being scheduled to the same hostname. If prefer more relaxed constraint, use preferredDuringSchedulingIgnoredDuringExecution.

apiVersion: extensions/v1beta1kind: Deploymentmetadata:  name: nginxspec:  replicas: 5  template:    metadata:      labels:                                                    app: nginx                                       spec:      affinity:        podAntiAffinity:                                           requiredDuringSchedulingIgnoredDuringExecution:   <---- hard requirement not to schedule "nginx" pod if already one scheduled.          - topologyKey: kubernetes.io/hostname     <---- Anti affinity scope is host                 labelSelector:                                             matchLabels:                                               app: nginx              container:        image: nginx:latest

Kubelet --max-pods

You can specify the max number of pods for a node in kubelet configuration so that in the scenario of node(s) down, it will prevent K8S from saturating another nodes with pods from the failed node.


I think the inter-pod anti-affinity feature will help you.Inter-pod anti-affinity allows you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node. Here is an example.

apiVersion: extensions/v1beta1kind: Deploymentmetadata:  labels:    run: nginx-service  name: nginx-servicespec:  replicas: 3  selector:    matchLabels:      run: nginx-service  template:    metadata:      labels:        service-type: nginx    spec:      affinity:        podAntiAffinity:          preferredDuringSchedulingIgnoredDuringExecution:          - labelSelector:              matchExpressions:              - key: service-type                operator: In                values:                - nginx            topologyKey: kubernetes.io/hostname      containers:      - name: nginx-service        image: nginx:latest

Note: I use preferredDuringSchedulingIgnoredDuringExecution here since you have more pods than nodes.

For more detailed information, you can refer to the Inter-pod affinity and anti-affinity (beta feature) part of following link:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/


Use Pod Topology Spread Constraints

As of 2021, (v1.19 and up) you can use Pod Topology Spread Constraints topologySpreadConstraints by default and I found it more suitable than podAntiAfinity for this case.

The major difference is that Anti-affinity can restrict only one pod per node, whereas Pod Topology Spread Constraints can restrict N pods per nodes.

apiVersion: apps/v1kind: Deploymentmetadata:  name: nginx-example-deploymentspec:  replicas: 6  selector:    matchLabels:      app: nginx-example  template:    metadata:      labels:        app: nginx-example    spec:      containers:      - name: nginx        image: nginx:latest      # This sets how evenly spread the pods      # For example, if there are 3 nodes available,      # 2 pods are scheduled for each node.      topologySpreadConstraints:      - maxSkew: 1        topologyKey: kubernetes.io/hostname        whenUnsatisfiable: DoNotSchedule        labelSelector:          matchLabels:            app: nginx-example

For more details see KEP-895 and an official blog post.