Where do I start with distributed computing? Where do I start with distributed computing? hadoop hadoop

Where do I start with distributed computing?


Maybe you can read some papers related to MapReduce and distributed computing first, to gain a better understanding of it. Here are some I would like to recommand:

On the other hand, if you want to know better of Hadoop, maybe you can start reading Hadoop MapReduce framework source code.


Currently, bookwise I would check out - Hadoop A Definitive Guide. Its written by Tom White who has worked on Hadoop for a good while now, and works at Cloudera with Doug Cutting (Hadoop creator).

Also on the free side, Jimmy Lin from UMD has written a book called: Data-Intensive Text Processing with MapReduce. Here's a link to the final pre-production verison (link provided by the author on his website).


Hadoop is not necessarily the best tool for all distributed computing problems. Despite its power, it also has a pretty steep learning curve and cost of ownership.You might want to clarify your requirements and look for suitable alternatives in the Java world, such as HTCondor, JPPF or GridGain (my apologies to those I do not mention).