How do I make my EKS AutoScalingGroup start a node with a specific instance-type if one is not already running? How do I make my EKS AutoScalingGroup start a node with a specific instance-type if one is not already running? kubernetes kubernetes

How do I make my EKS AutoScalingGroup start a node with a specific instance-type if one is not already running?


This appears to be a lack of understanding on my part of how the auto-scaler and ASGs work. Based on feedback from someone in a different forum, I learned that

A) auto-scaler runs as a pod on the cluster itself (hence why the out-of-the-box EKS does not support a minimum of 0 nodes; at least one node is required to run the kube-system/auto-scaler pods).

and B) the single auto-scaler pod is able to scale the multiple ASGs that exist on the cluster. So this allows us to separate our instances into separate ASGs by cost, and ensure that the expensive instances are only used when requested by the worker pods.

Our solution so far is this:

  • Set up at least 2 ASGs:

    1. Runs the 24/7 pods and the kube-system pods. This ASG uses the smaller, cheaper instance types.
    2. (or more, if it fits the use case) Runs the burstable pods. This ASG uses the larger, more expensive instance types that are required for the task processing.
  • Apply identifying labels to the ASGs. The EKS recommended approach (especially if you want to use Spot instances) is to use the instance size (e.g. micro, large, 4xlarge). This lets you easily add instances with the same resource sizes to an existing ASG for more reliability. Example:

      Labels:             asgsize=xlarge
  • Apply the node-selector in the pod yaml to match the desired node:

      spec:    nodeSelector:      asgsize: xlarge
  • Set the 24/7, small-instance ASG to min=1, desired=1, max=1 (at least; max can be bigger if that fits your needs)

  • Set the burstable, large-instance ASG to min=0, desired=0, max=(whatever is required for your environment)

When we implemented this approach, we were able to successfully have a small instance running 24/7, and have the larger instances burst up from 0 only when a pod with that label is created.

Disclaimer:

We also ran into this little bug on our auto-scaler where the large ASG was not scaling up from 0 initially:https://github.com/kubernetes/autoscaler/issues/2418

The workaround solution in that issue worked for us. We forced our large ASG to have a min=1. Then we started a pod on that group, set the min=0 again, and deleted the pod. The instance auto-scaled down and got terminated, and then the next time we requested the pod, it auto-scaled up correctly.


I never had this use case but I think you should try a combination of cluster autoscaler with nodeAffinity.

Refer: Special note on GPU instances