Setting up a multi-user job scheduler for data science / ML tasks Setting up a multi-user job scheduler for data science / ML tasks kubernetes kubernetes

Setting up a multi-user job scheduler for data science / ML tasks


As far as my knowledge goes, Kubernetes does not support sharing of GPU, which was asked here.

There is an ongoing discussion Is sharing GPU to multiple containers feasible? #52757

I was able to find a docker image with examples which "support share GPUs unofficially", available here cvaldit/nvidia-k8s-device-plugin.

This can be used in a following way:

apiVersion: v1kind: Podmetadata: name: gpu-podspec: containers: - name: cuda-container image: nvidia/cuda:9.0-devel resources: limits: nvidia.com/gpu: 2 # requesting 2 GPUs - name: digits-container image: nvidia/digits:6.0 resources: limits: nvidia.com/gpu: 2 # requesting 2 GPUs

That would expose 2 GPUs inside the container to run your job in, also locking those 2 GPUs from further use until job ends.

I'm not sure how would you scale those for multiple users, in other way then limiting them the maximum amount of used GPUs per job.

Also you can read about Schedule GPUs which is still experimental.