Python GPU programming [closed] Python GPU programming [closed] python python

Python GPU programming [closed]


PyCUDA provides very good integration with CUDA and has several helper interfaces to make writing CUDA code easier than in the straight C api. Here is an example from the Wiki which does a 2D FFT without needing any C code at all.


I will publish here some information that I read on reddit. It will be useful for people who are coming without a clear idea of what different packages do and how they connect cuda with Python:


From:Reddit

There's a lot of confusion in this thread about what various projects aim to do and how ready they are. There is no "GPU backend for NumPy" (much less for any of SciPy's functionality). There are a few ways to write CUDA code inside of Python and some GPU array-like objects which support subsets of NumPy's ndarray methods (but not the rest of NumPy, like linalg, fft, etc..)

  • PyCUDA and PyOpenCL come closest. They eliminate a lot of the plumbing surrounding launching GPU kernels (simplified array creation & memory transfer, no need for manual deallocation, etc...). For the most part, however, you're still stuck writing CUDA kernels manually, they just happen to be inside your Python file as a triple-quoted string. PyCUDA's GPUarray does include some limited NumPy-like functionality, so if you're doing something very simple you might get away without writing any kernels yourself.

  • NumbaPro includes a "cuda.jit" decorator which lets you write CUDA kernels using Python syntax. It's not actually much of an advance over what PyCUDA does (quoted kernel source), it's just your code now looks more Pythonic. It definitely doesn't, however, automatically run existing NumPy code on the GPU.

  • Theano let you construct symbolic expression trees and then compiles them to run on the GPU. It's not NumPy and only has equivalents for a small subset of NumPy's functionality.

  • gnumpy is a thinly documented wrapper around CudaMat. The only supported element type is float32 and only a small subset of NumPy is implemented.



I know that this thread is old, but I think I can bring some relevant information that answers to the question asked.

Continuum Analytics has a package that contains libraries that resolves the CUDA computing for you. Basically you instrument your code that needs to be parallelized (within a function) with a decorator and you need to import a library. Thus, you don't need any knowledge about CUDA instructions.

Information can be found on NVIDIA page

https://developer.nvidia.com/anaconda-accelerate

or you can go directly to the Continuum Analytics' page

https://store.continuum.io/cshop/anaconda/

There is a 30 day trial period and a free licence for academics.

I use this extensively and accelerates my code between 10 to 50 times.