How to load from disk, process, then store data in a common hdf5 concurrently with python, pyqt, h5py? How to load from disk, process, then store data in a common hdf5 concurrently with python, pyqt, h5py? multithreading multithreading

How to load from disk, process, then store data in a common hdf5 concurrently with python, pyqt, h5py?


This is quite a complex problem to solve, and this format is not really suited to providing complete answers to all your questions. However, I'll attempt to put you on the right track.

How do I make this process non-blocking, so that the UI is still responsive? I'd still like my processing function to be able to update the QProgressDialog and it's associated label.

To make it non-blocking, you need to offload the processing into a Python thread or QThread. Better yet, offload it into a subprocess that communicates progress back to the main program via a thread in the main program.

I'll leave you to implement (or ask another question on) creating subprocesses or threads. However, you need to be aware that only the MainThread can access GUI methods. This means you need to emit a signal if using a QThread or use QApplication.postEvent() from a python thread (I've wrapped the latter up into a library for Python 2.7 here. Python 3 compatibility will come one day)

Can I extend this to process more than one dataset concurrently and retain the ability to update the progressbar info?

Yes. One example would be to spawn many subprocesses. Each subprocess can be configured to send messages back to an associated thread in the main process, which communicates the progress information to the GUI via the method described for the above point. How you display this progress information is up to you.

Can I write into h5py from more than one thread/process/etc.? Will I have to implement locking on the write operation?

You should not write to a hdf5 file from more than one thread at a time. You will need to implement locking. I think possibly even read access should be serialised.

A colleague of mine has produced something along these lines for Python 2.7 (see here and here), you are welcome to look at it or fork it if you wish.