Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed] Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed] hadoop hadoop

Is there a good online tutorial for Hadoop development on a Windows 7 machine? [closed]


The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.

I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.

The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2

This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.

To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun

Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.


I was completely new to hadoop and honestly I found the cloudera tutorials and information completely unhelpful. Give the IBM ones a shot, they're super helpful and they are very friendly for beginners. Step by step instructions for pretty much all of the core hadoop applications and a few specific to IBM's distro.

Here's the download link. --

https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF

You have to make an account but it's free and doesn't take that long.

I can't post more than one link right now but is pretty easy to find the tutorials online and they also exist within the VM.

Also there's a forum that I've posted my questions on when I get stuck and somebody from IBM has always helped me out within an hour to a day. Cant post the link but if you google "IBM InfoSphere BigInsights Forum", its the first hit.

Good Luck!


I am trying to learn Hadoop right now also and what I did was download virtual box ( http://www.virtualbox.org/ ) and load some linux images on it and started following tutorials.

You can even get a pre-made hadoop setup image from cloudera. I think this approach is far better than installing and setting up on your prime machine because in the event there's a problem you're main machine won't be effected(you can simply revert to an old copy of your virtual linux image or scrape it and start again without any impact).

Good luck!