Kubernetes + TF serving - how to use hundred of ML models without running hundred of idle pods up and running? Kubernetes + TF serving - how to use hundred of ML models without running hundred of idle pods up and running? kubernetes kubernetes

Kubernetes + TF serving - how to use hundred of ML models without running hundred of idle pods up and running?


What you are trying to do is to scale deployment to zero when these are not used.

K8s does not provide such functionality out of the box.

You can achieve it using Knative Pod Autoscaler.Knative is probably the most mature solution available at the moment of writing this answer.

There are also some more experimental solutions like osiris or zero-pod-autoscaler you may find interesting and that may be a good fit for your usecase.