Parallelizing the "Reduce" in "MapReduce" Parallelizing the "Reduce" in "MapReduce" multithreading multithreading

Parallelizing the "Reduce" in "MapReduce"


If your reduction underlying operation is associative*, you can play with the order of operations and locality. Therefore you often have a tree-like structure in the 'gather' phase, so you can do it in several passes in logarithmic time:

a  +  b  +  c  +  d \   /       \   / (a+b)       (c+d)     \       /   ((a+b)+(c+d))

instead of (((a+b)+c)+d)

If your operation is commutative, further optimization are possible as you can gather in different order (it may be important for data alignment when those operations are vector operations for example)

[*] your real desired mathematical operations, not those on effective types like floats of course.


Yes, if the operator is associative. For example, you can parallelise summing a list of numbers:

step 1: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8step 2:   3   +   7   +   11  +   15step 3:       10      +       26step 4:               36

This works because (a+b)+c = a+(b+c), i.e. the order in which the additions are performed doesn't matter.