Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 hadoop hadoop

Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1


Error code 1 is a generic error for Hadoop Streaming. You can get this error code for two main reasons:

  • Your Mapper and Reducer scripts are not executable (include the #!/usr/bin/python at the beginning of the script).

  • Your Python program is simply written wrong - you could have a syntax error or logical bug.

Unfortunately, error code 1 does not give you any details to see exactly what is wrong with your Python program.

I was stuck with error code 1 for a while myself, and the way I figured it out was to simply run my Mapper script as a standalone python program: python mapper.py

After doing this, I got a regular Python error that told me I was simply giving a function the wrong type of argument. I fixed my syntax error, and everything worked after that. So if possible, I'd run your Mapper or Reducer script as a standalone Python program to see if that gives you any insight on the reasoning for your error.


I got the same error, sub-process failed with code 1

[cloudera@quickstart ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input /user/cloudera/input -output /user/cloudera/output_join -mapper /home/cloudera/join1_mapper.py -reducer /home/cloudera/join1_reducer.py
  1. This is primarily because of a hadoop unable to access your input files, or may be you have something in your input which is more than required, or something missing. So, be very very careful with the input directory and files you have in them. I would say, only place exactly required input files in the input directory for the assignment and remove rest of them.

  2. Also make sure your mapper and reducer files are executable.chmod +x mapper.py and chmod +x reducer.py

  3. Run the mapper of reducer python file using cat using only mapper:cat join2_gen*.txt | ./mapper.py | sortusing reducer:cat join2_gen*.txt | ./mapper.py | sort | ./reducer.pyThe reason for running them using cat is because If your input files have any error you can remove them before you run on Hadoop cluster. Sometimes map/reduce jobs cannot find the python errors!!


I faced the same problem when running , my mapper and reducer scripts were not executable.

Adding #! /usr/bin/python at the top of my files fixed the issue.