Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

python hadoop mapreduce hadoop-streaming mrjob

Error code 1 is a generic error for Hadoop Streaming. You can get this error code for two main reasons:

Your Mapper and Reducer scripts are not executable (include the #!/usr/bin/python at the beginning of the script).
Your Python program is simply written wrong - you could have a syntax error or logical bug.

Unfortunately, error code 1 does not give you any details to see exactly what is wrong with your Python program.

I was stuck with error code 1 for a while myself, and the way I figured it out was to simply run my Mapper script as a standalone python program: python mapper.py

After doing this, I got a regular Python error that told me I was simply giving a function the wrong type of argument. I fixed my syntax error, and everything worked after that. So if possible, I'd run your Mapper or Reducer script as a standalone Python program to see if that gives you any insight on the reasoning for your error.

python hadoop mapreduce hadoop-streaming mrjob

I got the same error, sub-process failed with code 1

[cloudera@quickstart ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input /user/cloudera/input -output /user/cloudera/output_join -mapper /home/cloudera/join1_mapper.py -reducer /home/cloudera/join1_reducer.py

This is primarily because of a hadoop unable to access your input files, or may be you have something in your input which is more than required, or something missing. So, be very very careful with the input directory and files you have in them. I would say, only place exactly required input files in the input directory for the assignment and remove rest of them.
Also make sure your mapper and reducer files are executable.chmod +x mapper.py and chmod +x reducer.py
Run the mapper of reducer python file using cat using only mapper:cat join2_gen*.txt | ./mapper.py | sortusing reducer:cat join2_gen*.txt | ./mapper.py | sort | ./reducer.pyThe reason for running them using cat is because If your input files have any error you can remove them before you run on Hadoop cluster. Sometimes map/reduce jobs cannot find the python errors!!

python hadoop mapreduce hadoop-streaming mrjob

I faced the same problem when running , my mapper and reducer scripts were not executable.

Adding #! /usr/bin/python at the top of my files fixed the issue.

CodeHunter

Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last