Kubernetes HPA Auto Scaling Velocity Kubernetes HPA Auto Scaling Velocity kubernetes kubernetes

Kubernetes HPA Auto Scaling Velocity


First of all, the 80% CPU utilisation is not a threshold but a target value.

The HPA algorithm for calculating the desired number of replicas is based on the following formula:

X = N * (C/T)

Where:

  • X: desired number of replicas
  • N: current number of replicas
  • C: current value of the metric
  • T: target value for the metric

In other words, the algorithm aims at calculating a replica count that keeps the observed metric value as close as possible to the target value.

In your case, this means if the average CPU utilisation across the pods of your app is below 80%, the HPA tends to decrease the number of replicas (to make the CPU utilisation of the remaining pods go up). On the other hand, if the average CPU utilisation across the pods is above 80%, the HPA tends to increase the number of replicas, so that the CPU utilisation of the individual pods decreases.

The number of replicas that are added or removed in a single step depends on how far apart the current metric value is from the target value and on the current number of replicas. This decision is internal to the HPA algorithm and you can't directly influence it. The only contract that the HPA has with its users is to keep the metric value as close as possible to the target value.

If you need a very specific autoscaling behaviour, you can write a custom controller (or operator) to autoscale your application instead of using the HPA.


This - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details - expains the algorithm HPA uses, including the formula to calculate the number of "desired replicas".

If I recall, there were some (positive) changes to the HPA algo with v1.12.


HPA has total control on scale up as of today. You can only fine tune scale down operation with the following parameter.

--horizontal-pod-autoscaler-downscale-stabilization

The good news is that there is a proposal for Configurable scale up/down velocity for HPA