Redirecting CUDA printf to a C++ stream Redirecting CUDA printf to a C++ stream linux linux

Redirecting CUDA printf to a C++ stream


Device side printf() causes implicit serialization of the threads that are printing so you probably wouldn't want to use it in production code.

Device side printf() works by having the kernel copy messages to a preallocated ring buffer. Upon implicit or explit device synchronization (cudaDeviceSynchronize()), CUDA dumps any contents in the buffer to stdout and then clears it.

You could simply implement your own device printf(). Its performance would probably not be any worse than the built in one. The only disadvantage is that you would have to pass the ring buffer to the kernel and add a call to process it after the kernel returns.

Implementation would be something like this:

  • Create a buffer with room for a fixed numer of printf() formatting strings and associated 32-bit or 64-bit parameters.

  • Create a device function that uses atomicInc() to keep track of the current print location and takes a formatting string and parameters and copies them into the current location.

  • Pass the ring buffer to the kernel, which then passes it to the device function together with the print parameters.

  • Create a host function that takes the ring buffer, runs the formatting strings and parameters through host side sprintf() and passes the results to the logger.