Hadoop Streaming Job Failed (Not Successful) in Python Hadoop Streaming Job Failed (Not Successful) in Python hadoop hadoop

Hadoop Streaming Job Failed (Not Successful) in Python


You are missing a lot of configurations and you need to define directories and such. See here:

http://wiki.apache.org/hadoop/QuickStart

Distributed operation is just like the pseudo-distributed operation described above, except:

  1. Specify hostname or IP address of the master server in the values for fs.default.name and mapred.job.tracker in conf/hadoop-site.xml. These are specified as host:port pairs.
  2. Specify directories for dfs.name.dir and dfs.data.dir in conf/hadoop-site.xml. These are used to hold distributed filesystem data on the master node and slave nodes respectively. Note that dfs.data.dir may contain a space- or comma-separated list of directory names, so that data may be stored on multiple devices.
  3. Specify mapred.local.dir in conf/hadoop-site.xml. This determines where temporary MapReduce data is written. It also may be a list of directories.
  4. Specify mapred.map.tasks and mapred.reduce.tasks in conf/mapred-default.xml. As a rule of thumb, use 10x the number of slave processors for mapred.map.tasks, and 2x the number of slave processors for mapred.reduce.tasks.
  5. List all slave hostnames or IP addresses in your conf/slaves file, one per line and make sure jobtracker is in your /etc/hosts file pointing to your jobtracker node


Well, I stuck upon the same problem for 2 days now.. The solution that Joe provided in his other post works well for me..

As a solution to your problem I suggest:

1) Follow blindly and only blindly the instructions on how to setup a single node cluster here (I assume you have already done so)

2) If anywhere you face a java.io.IOException: Incompatible namespaceIDs error (you will find it if you examine the logs), have a look here

3) REMOVE ALL THE DOUBLE QUOTES FROM YOUR COMMAND, in your example run

./bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \    -input "p1input/*" \    -output p1output \    -mapper p1mapper.py \    -reducer p1reducer.py \    -file /Users/Tish/Desktop/HW1/p1mapper.py \    -file /Users/Tish/Desktop/HW1/p1reducer.py

this is ridiculous, but it was the point at which I stuck for 2 whole days