Hadoop Streaming Job Failed (Not Successful) in Python
You are missing a lot of configurations and you need to define directories and such. See here:
http://wiki.apache.org/hadoop/QuickStart
Distributed operation is just like the pseudo-distributed operation described above, except:
- Specify hostname or IP address of the master server in the values for fs.default.name and mapred.job.tracker in conf/hadoop-site.xml. These are specified as host:port pairs.
- Specify directories for dfs.name.dir and dfs.data.dir in conf/hadoop-site.xml. These are used to hold distributed filesystem data on the master node and slave nodes respectively. Note that dfs.data.dir may contain a space- or comma-separated list of directory names, so that data may be stored on multiple devices.
- Specify mapred.local.dir in conf/hadoop-site.xml. This determines where temporary MapReduce data is written. It also may be a list of directories.
- Specify mapred.map.tasks and mapred.reduce.tasks in conf/mapred-default.xml. As a rule of thumb, use 10x the number of slave processors for mapred.map.tasks, and 2x the number of slave processors for mapred.reduce.tasks.
- List all slave hostnames or IP addresses in your conf/slaves file, one per line and make sure jobtracker is in your /etc/hosts file pointing to your jobtracker node
Well, I stuck upon the same problem for 2 days now.. The solution that Joe provided in his other post works well for me..
As a solution to your problem I suggest:
1) Follow blindly and only blindly the instructions on how to setup a single node cluster here (I assume you have already done so)
2) If anywhere you face a java.io.IOException: Incompatible namespaceIDs error (you will find it if you examine the logs), have a look here
3) REMOVE ALL THE DOUBLE QUOTES FROM YOUR COMMAND, in your example run
./bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \ -input "p1input/*" \ -output p1output \ -mapper p1mapper.py \ -reducer p1reducer.py \ -file /Users/Tish/Desktop/HW1/p1mapper.py \ -file /Users/Tish/Desktop/HW1/p1reducer.py
this is ridiculous, but it was the point at which I stuck for 2 whole days