Why isn't Hadoop implemented using MPI? Why isn't Hadoop implemented using MPI? hadoop hadoop

Why isn't Hadoop implemented using MPI?


One of the big features of Hadoop/map-reduce is the fault tolerance. Fault tolerance is not supported in most (any?) current MPI implementations. It is being thought about for future versions of OpenMPI.

Sandia labs has a version of map-reduce which uses MPI, but it lacks fault tolerance.


MPI is Message Passing Interface. Right there in the name - there is no data locality. You send the data to another node for it to be computed on. Thus MPI is network-bound in terms of performance when working with large data.

MapReduce with the Hadoop Distributed File System duplicates data so that you can do your compute in local storage - streaming off the disk and straight to the processor. Thus MapReduce takes advantage of local storage to avoid the network bottleneck when working with large data.

This is not to say that MapReduce doesn't use the network... it does: and the shuffle is often the slowest part of a job! But it uses it as little, and as efficiently as possible.

To sum it up: Hadoop (and Google's stuff before it) did not use MPI because it could not have used MPI and worked. MapReduce systems were developed specifically to address MPI's shortcomings in light of trends in hardware: disk capacity exploding (and data with it), disk speed stagnant, networks slow, processor gigahertz peaked, multi-core taking over Moore's law.


The truth is Hadoop could be implemented using MPI. MapReduce has been used via MPI for as long as MPI has been around. MPI has functions like 'bcast' - broadcast all data, 'alltoall' - send all data to all nodes, 'reduce' and 'allreduce'. Hadoop removes the requirement to explicitly implement your data distribution and gather your result methods by packaging an outgoing communication command with a reduce command. The upside is you need to make sure your problem fits the 'reduce' function before you implement Hadoop. It could be your problem is a better fit for 'scatter'/'gather' and you should use Torque/MAUI/SGE with MPI instead of Hadoop. Finally, MPI does not write your data to disk as described in another post, unless you follow your receive method with a write to disk. It works just as Hadoop does by sending your process/data somewhere else to do the work. The important part is to understand your problem with enough detail to be sure MapReduce is the most efficient parallelization strategy, and be aware that many other strategies exist.