How do I pass a parameter to a python Hadoop streaming job? How do I pass a parameter to a python Hadoop streaming job? hadoop hadoop

How do I pass a parameter to a python Hadoop streaming job?


The argument to the command line option -reducer can be any command, so you can try:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \    -input inputDirs \    -output outputDir \    -mapper myMapper.py \    -reducer 'myReducer.py 1 2 3' \    -file myMapper.py \    -file myReducer.py

assuming myReducer.py is made executable. Disclaimer: I have not tried it, but I have passed similar complex strings to -mapper and -reducer before.

That said, have you tried the

-cmdenv name=value

option, and just have your Python reducer get its value from the environment? It's just another way to do things.


In your Python code,

import os(...)os.environ["PARAM_OPT"]

In your Hapdoop command include:

hadoop jar \(...)-cmdenv PARAM_OPT=value\(...)


You can -reducer as the below command

hadoop jar hadoop-streaming.jar \-mapper 'count_mapper.py arg1 arg2' -file count_mapper.py \-reducer 'count_reducer.py arg3' -file count_reducer.py \

you can revise this Link