Trying to understand what values to use for resources and limits of multiple container deployment Trying to understand what values to use for resources and limits of multiple container deployment kubernetes kubernetes

Trying to understand what values to use for resources and limits of multiple container deployment


How to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually

Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory.

  • Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.
  • Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.

The number of replicas will be determined by the autoscaler on the ReplicaController.

when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated.

  • pending state means that there is not resources available to schedule new pods.

  • restarting may be triggered by other issues, I'd suggest you to debug it after solving the scaling issues.

My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.

  • You are correct, if you don't set the request limit, the % desired will remain unknown and the autoscaler won't be able to trigger scaling up or down.

  • Here you can see algorithm responsible for that.

  • Horizontal Pod Autoscaler will trigger new pods based on the request % of usage on the pod. In this case whenever the pod reachs 80% of the max request value it will trigger new pods up to the maximum specified.

For a good HPA example, check this link: Horizontal Pod Autoscale Walkthrough


But How does Horizontal Pod Autoscaler works with Cluster Autoscaler?

  • Horizontal Pod Autoscaler changes the deployment's or replicaset's number of replicas based on the current CPU load. If the load increases, HPA will create new replicas, for which there may or may not be enough space in the cluster.

  • If there are not enough resources, CA will try to bring up some nodes, so that the HPA-created pods have a place to run. If the load decreases, HPA will stop some of the replicas. As a result, some nodes may become underutilized or completely empty, and then CA will terminate such unneeded nodes.

NOTE: The key is to set the maximum replicas for HPA thinking on a cluster level according to the amount of nodes (and budget) available for your app, you can start setting a very high max number of replicas, monitor and then change it according to the usage metrics and prediction of future load.

If you have any question let me know in the comments.