Better way to share memory for multiprocessing in Python? Better way to share memory for multiprocessing in Python? multithreading multithreading

Better way to share memory for multiprocessing in Python?


Queue is for communication between processes. In your case, you don't really have this kind of communication. You can simply let the process return result, and use the .get() method to collect them. (Remember to add if __name__ == "main":, see programming guideline)

from PIL import Imagefrom multiprocessing import Pool, Lockimport numpyimg = Image.open("/path/to/image.jpg")def rz():    totalPatchCount = 20000    imgArray = numpy.asarray(patch, dtype=numpy.float32)    list_im_arr = [imgArray] * totalPatchCount  # A more elegant way than a for loop    return list_im_arrif __name__ == '__main__':      # patch = img....  Your code to get generate patch here    patch = patch.resize((60,40), Image.ANTIALIAS)    patch = patch.convert('L')    pool = Pool(2)    imdata = [pool.apply_async(rz).get() for x in range(2)]    pool.close()    pool.join()

Now, according to first answer of this post, multiprocessing only pass objects that's picklable. Pickling is probably unavoidable in multiprocessing because processes don't share memory. They simply don't live in the same universe. (They do inherit memory when they're first spawned, but they can not reach out of their own universe). PIL image object itself is not picklable. You can make it picklable by extracting only the image data stored in it, like this post suggested.

Since your problem is mostly I/O bound, you can also try multi-threading. It might be even faster for your purpose. Threads share everything so no pickling is required. If you're using python 3, ThreadPoolExecutor is a wonderful tool. For Python 2, you can use ThreadPool. To achieve higher efficiency, you'll have to rearrange how you do things, you want to break-up the process and let different threads do the job.

from PIL import Imagefrom multiprocessing.pool import ThreadPoolfrom multiprocessing import Lockimport numpyimg = Image.open("/path/to/image.jpg")lock = Lock():totalPatchCount = 20000def rz(x):    patch = ...    return patchpool = ThreadPool(8)imdata = [pool.map(rz, range(totalPatchCount)) for i in range(2)]pool.close()pool.join()


You say "Apparently Queues have a limited amount of data they can save otherwise when you call queue_obj.get() the program hangs."

You're right and wrong there. There is a limited amount of information the Queue will hold without being drained. The problem is that when you do:

qn1.put(list_im_arr)qn1.cancel_join_thread()

it schedules the communication to the underlying pipe (handled by a thread). The qn1.cancel_join_thread() then says "but it's cool if we exit without the scheduled put completing", and of course, a few microseconds later, the worker function exits and the Process exits (without waiting for the thread that is populating the pipe to actually do so; at best it might have sent the initial bytes of the object, but anything that doesn't fit in PIPE_BUF almost certainly gets dropped; you'd need some amazing race conditions to occur to get anything at all, let alone the whole of a large object). So later, when you do:

imdata = q.get()

nothing has actually been sent by the (now exited) Process. When you call q.get() it's waiting for data that never actually got transmitted.

The other answer is correct that in the case of computing and conveying a single value, Queues are overkill. But if you're going to use them, you need to use them properly. The fix would be to:

  1. Remove the call to qn1.cancel_join_thread() so the Process doesn't exit until the data has been transmitted across the pipe.
  2. Rearrange your calls to avoid deadlock

Rearranging is just this:

p = Process(target=rz,args=(img, q, q2,))p.start()imdata = q.get()p.join()

moving p.join() after q.get(); if you try to join first, your main process will be waiting for the child to exit, and the child will be waiting for the queue to be consumed before it will exit (this might actually work if the Queue's pipe is drained by a thread in the main process, but it's best not to count on implementation details like that; this form is correct regardless of implementation details, as long as puts and gets are matched).