How do we assign pods properly so that KFServing can scale down GPU Instances to zero?

kubernetes gpu amazon-eks kubeflow argo-workflows

tl;drYou can use taints.

Which pods need to be assigned to our GPU nodes?

The pods of the jobs that require GPU.

If your training job requires GPU you need to assign it using the nodeSelector and tolerations in the spec of your training/deployment deployment, see a nice example here.

If your model is CV/NLP (many matrix multiplications), you might want to have the inferenceservice in the GPU as well, in that case you need to have it requested in its spec as described here.

Do we only need our argo workflow pod to be assigned and repel therest?

Yes, if your inferenceservice does not require GPU.

Are there other kfserving components needed within the GPU node to work right?

No, the only kfserving component is the kfserving-controller and does not require a gpu as it's only orchestrating the creation of the istio&knative resources for your inferenceservice.

If there are inferenceservices running in your gpu nodegroup without having the GPU requested in the spec, it means that the nodegroup is not configured to have the taint effect NoSchedule. Make sure that the gpu nodegroup in the eksctl configuration has the taint as described in the doc.

CodeHunter

How do we assign pods properly so that KFServing can scale down GPU Instances to zero?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last