Kubernetes HPA not downscaling as expected
The formula for how the HPA decides how many pods to run is in the Horizontal Pod Autoscaler documentation:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
With the numbers you give, currentReplicas
is 3, currentMetricValue
is 300 MiB, and desiredMetricValue
is 400 MiB, so this reduces to
desiredReplicas = ceil[3 * (300 / 400)]desiredReplicas = ceil[3 * 0.75]desiredReplicas = ceil[2.25]desiredReplicas = 3
You need to decrease the load further (below 266 MiB average memory utilization) or increase the target memory utilization for this to scale down more.
(Simply being below the target won't trigger scale-down on its own, you must be enough below the target for this formula to produce a lower number. This helps avoid thrashing if the load is right around a threshold that would trigger scaling in one direction or the other.)
There are two things to look at:
- The API version:
The beta version, which includes support for scaling on memory andcustom metrics, can be found in
autoscaling/v2beta2
. The new fieldsintroduced inautoscaling/v2beta2
are preserved as annotations whenworking withautoscaling/v1
.
The autoscaling/v2beta2
was introduced in K8s 1.12 so despite the fact you are using 1.13 (which is 6 major versions old now) it should work fine (however, upgrading to a newer version is recommended). Try changing your apiVersion:
to autoscaling/v2beta2
.
--horizontal-pod-autoscaler-downscale-stabilization
: The value forthis option is a duration that specifies how long the autoscaler hasto wait before another downscale operation can be performed after thecurrent one has completed. The default value is 5 minutes (5m0s
).
Check the value of this particular flag after changing the API suggested above.