How do we assign pods properly so that KFServing can scale down GPU Instances to zero? How do we assign pods properly so that KFServing can scale down GPU Instances to zero? kubernetes kubernetes

How do we assign pods properly so that KFServing can scale down GPU Instances to zero?


tl;drYou can use taints.

Which pods need to be assigned to our GPU nodes?

The pods of the jobs that require GPU.

If your training job requires GPU you need to assign it using the nodeSelector and tolerations in the spec of your training/deployment deployment, see a nice example here.

If your model is CV/NLP (many matrix multiplications), you might want to have the inferenceservice in the GPU as well, in that case you need to have it requested in its spec as described here.

Do we only need our argo workflow pod to be assigned and repel therest?

Yes, if your inferenceservice does not require GPU.

Are there other kfserving components needed within the GPU node to work right?

No, the only kfserving component is the kfserving-controller and does not require a gpu as it's only orchestrating the creation of the istio&knative resources for your inferenceservice.

If there are inferenceservices running in your gpu nodegroup without having the GPU requested in the spec, it means that the nodegroup is not configured to have the taint effect NoSchedule. Make sure that the gpu nodegroup in the eksctl configuration has the taint as described in the doc.