passing numpy arrays through multiprocessing.Queue

python numpy parallel-processing multiprocessing pickle

The issue isn't with numpy, but the default settings for how pickle represents data (as strings so the output is human readable). You can change the default settings for pickle to produce binary data instead.

import numpyimport cPickle as pickleN = 1000a0 = pickle.dumps(numpy.zeros(N))a1 = pickle.dumps(numpy.zeros(N), protocol=-1)print "a0", len(a0)   # 32155print "a1", len(a1)   #  8133

Also, note, that if you want to decrease processor work and time, you should probably use cPickle instead of pickle (but the space savings due to using the binary protocol work regardless of pickle version).

On shared memory:
On the question of shared memory, there are a few things to consider. Shared data typically adds a significant amount of complexity to code. Basically, for every line of code that uses that data, you will need to worry about whether some other line of code in another process is simultaneously using that data. How hard this will be will depend on what you're doing. The advantages are that you save time sending the data back and forth. The question that Eelco cites is for a 60GB array, and for this there's really no choice, it has to be shared. On the other hand, for most reasonably complex code, deciding to share data simply to save a few microseconds or bytes would probably be one of the worst premature optimizations one could make.

python numpy parallel-processing multiprocessing pickle

Share Large, Read-Only Numpy Array Between Multiprocessing Processes

That should cover it all. Pickling of uncompressible binary data is a pain regardless of the protocol used, so this solution is much to be preferred.

CodeHunter

passing numpy arrays through multiprocessing.Queue

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last