Setting up a multi-user job scheduler for data science / ML tasks

kubernetes gpu cluster-computing slurm docker-datacenter

As far as my knowledge goes, Kubernetes does not support sharing of GPU, which was asked here.

There is an ongoing discussion Is sharing GPU to multiple containers feasible? #52757

I was able to find a docker image with examples which "support share GPUs unofficially", available here cvaldit/nvidia-k8s-device-plugin.

This can be used in a following way:

apiVersion: v1kind: Podmetadata: name: gpu-podspec: containers: - name: cuda-container image: nvidia/cuda:9.0-devel resources: limits: nvidia.com/gpu: 2 # requesting 2 GPUs - name: digits-container image: nvidia/digits:6.0 resources: limits: nvidia.com/gpu: 2 # requesting 2 GPUs

That would expose 2 GPUs inside the container to run your job in, also locking those 2 GPUs from further use until job ends.

I'm not sure how would you scale those for multiple users, in other way then limiting them the maximum amount of used GPUs per job.

Also you can read about Schedule GPUs which is still experimental.

CodeHunter

Setting up a multi-user job scheduler for data science / ML tasks

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last