How do I pass a parameter to a python Hadoop streaming job? How do I pass a parameter to a python Hadoop streaming job? hadoop hadoop

How do I pass a parameter to a python Hadoop streaming job?

The argument to the command line option -reducer can be any command, so you can try:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \    -input inputDirs \    -output outputDir \    -mapper \    -reducer ' 1 2 3' \    -file \    -file

assuming is made executable. Disclaimer: I have not tried it, but I have passed similar complex strings to -mapper and -reducer before.

That said, have you tried the

-cmdenv name=value

option, and just have your Python reducer get its value from the environment? It's just another way to do things.

In your Python code,

import os(...)os.environ["PARAM_OPT"]

In your Hapdoop command include:

hadoop jar \(...)-cmdenv PARAM_OPT=value\(...)

You can -reducer as the below command

hadoop jar hadoop-streaming.jar \-mapper ' arg1 arg2' -file \-reducer ' arg3' -file \

you can revise this Link