What's the overhead of the different forms of parallelism in Julia v0.5? What's the overhead of the different forms of parallelism in Julia v0.5? multithreading multithreading

What's the overhead of the different forms of parallelism in Julia v0.5?


A few caveats: 1. I'm speaking from experience with version 0.4.6, (and prior), haven't played with 0.5 yet (but, as I hope my answer below demonstrates, I don't think this is essential vis-a-vis the response I give). 2. this isn't a fully comprehensive answer.

Nevertheless, from my experience, the overhead for multiple processes itself is very small provided that you aren't dealing with data movement issues. In other words, in my experience, any time that you ever find yourself in a situation of wishing something were faster than a single process on your CPU can manage, you're well past the point where parallelism will be beneficial. For instance, in the sum of random numbers example that you gave, I found through testing just now that the break-even point was somewhere around 10,000 random numbers. Anything more and parallelism was the clear winner. Generating 10,000 random number is trivial for modern computers, taking a tiny fraction of a second, and is well below the threshold where I'd start getting frustrated by the slowness of my scripts and want parallelism to speed them up.

Thus, I at least am of the opinion, that although there are probably even more wonderful things that the Julia developers could do to cut down on the overhead even more, at this point, anything pertinent to Julia isn't going to be so much of your limiting factor, at least in terms of the computation aspects of parallelism. I think that there are still improvements to be made in terms of enhancing both the ease and the efficiency of parallel data movement (I like the package that you've started on that topic as a good step. You and I would probably both agree there's still a ways more to go). But, the big limiting factors will be:

  1. How much data do you need to be moving around between processes?
  2. How much read/write to your memory do you need to be doing during your computations? (e.g. flops per read/write)

Aspect 1. might at times lean against using parallelism. Aspect 2. is more likely just to mean that you won't get so much benefit from it. And, at least as I interpret "overhead," neither of these really fall so directly into that specific consideration. And, both of these are, I believe, going to be far more heavily determined by your system hardware than by Julia.