cuda device selection with multiple cpu threads cuda device selection with multiple cpu threads multithreading multithreading

cuda device selection with multiple cpu threads


Yes the GPU device needs to be set explicitly or the default one would be used (device 0 usually)

Keep in mind that once the runtime starts using one device all the functions called in the same thread will be pinned to that device.

Something I find useful upon starting a thread is

cudaThreadExit(); // clears all the runtime state for the current threadcudaSetDevice(deviceId); // explicit set the current device for the other callscudaMalloccudaMemcpyetc.. 

The programming guide has a chapter dedicated to it.


It depends on the mode in which GPUs are set.

Call nvidia-smi -q to find the Compute Mode of your GPU. Depending on the version of the CUDA framework you use, the output will be different.

Basically, default mode is set for GPUs. It allows several contexts to run alternatively on the same GPU. However, each context must explicitly release the GPU: while a context owns the GPU, the others are blocked for a short period, then killed after a timeout.

To bypass this limitation, you can call nvidia-smi -c with one of this explicit value, depending on your needs:

  • DEFAULT
  • EXCLUSIVE_THREAD
  • PROHIBITED
  • EXCLUSIVE_PROCESS


Yes, GPU devices need to be set explicitly.

One simple strategy would consist of setting all the GPUs to EXCLUSIVE_THREAD (as shown by jopasserat). A thread would iterate through all the available GPUs and try to pick up a free GPU until it succeeds.

The same mechanism would work fine in case of EXCLUSIVE_PROCESS.

See 3.4 compute modes in the cuda toolkit documentation.