Hadoop Streaming Job failed error in python Hadoop Streaming Job failed error in python hadoop hadoop

Hadoop Streaming Job failed error in python


Your -mapper and -reducer should just be the script name.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input my-input/* -output my-output

When your scripts are in the job that is in another folder within hdfs which is relative to the attempt task executing as "." (FYI if you ever want to ad another -file such as a look up table you can open it in Python as if it was in the same dir as your scripts while your script is in M/R job)

also make sure you have chmod a+x mapper.py and chmod a+x reducer.py


Try to add

 #!/usr/bin/env python

top of your script.

Or,

-mapper 'python m.py' -reducer 'r.py'


You need to explicitly instruct that mapper and reducer are used as python script, as we have several options for streaming. You can use either single quotes or double quotes.

-mapper "python mapper.py" -reducer "python reducer.py" 

or

-mapper 'python mapper.py' -reducer 'python reducer.py'

The full command goes like this:

hadoop jar /path/to/hadoop-mapreduce/hadoop-streaming.jar \-input /path/to/input \-output /path/to/output \-mapper 'python mapper.py' \-reducer 'python reducer.py' \-file /path/to/mapper-script/mapper.py \-file /path/to/reducer-script/reducer.py