How do I pass a parameter to a python Hadoop streaming job?

The argument to the command line option -reducer can be any command, so you can try:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \    -input inputDirs \    -output outputDir \    -mapper myMapper.py \    -reducer 'myReducer.py 1 2 3' \    -file myMapper.py \    -file myReducer.py

assuming myReducer.py is made executable. Disclaimer: I have not tried it, but I have passed similar complex strings to -mapper and -reducer before.

That said, have you tried the

-cmdenv name=value

option, and just have your Python reducer get its value from the environment? It's just another way to do things.

python hadoop hadoop-streaming

In your Python code,

import os(...)os.environ["PARAM_OPT"]

In your Hapdoop command include:

hadoop jar \(...)-cmdenv PARAM_OPT=value\(...)

python hadoop hadoop-streaming

You can -reducer as the below command

hadoop jar hadoop-streaming.jar \-mapper 'count_mapper.py arg1 arg2' -file count_mapper.py \-reducer 'count_reducer.py arg3' -file count_reducer.py \

you can revise this Link

CodeHunter

How do I pass a parameter to a python Hadoop streaming job?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last