pass Hadoop arguments into Java code
You can pass the arguments in two ways. Either using -D option or using configuration. But you can only use -D option when you implement Tool interface. If not then you have to set the configuration variables by conf.set.
Passing parameters using -D:
hadoop jar example.jar com.example.driver -D property=value /input/path /output/path
Passing parameters using Configuration:
Configuration conf=new Configuration();conf.set("property","value");Job job=new Job(conf);
Note: All the configuration variables have to be set before initializing Job class
Driver class should implement Tool interface which allow you to use ToolRunner to run your MapReduce job:
public class MRDriver extends Configured implements Tool { @Override public int run(String[] args) throws Exception { /*...*/ }}
Then you'll be able to run jobs by following way:
public static void main(String[] args) throws Exception { int res = ToolRunner.run(new MRDriver(), args); System.exit(res);}
It means that all your commannd line parameters parsed by ToolRunner to the current instance of Configuration class.
Assuming you run job from console with following command:
hadoop jar munge-data.jar -Denv1=prod1 -Denv2=prod2
Then in run()
method you can get all your arguments from Configuration class:
public int run(String args[]) { Configuration conf = getConf(); String env1 = conf.get("env1"); String env2 = conf.get("env2"); Job job = new Job(conf, "MR Job"); job.setJarByClass(MRDriver.class); /*...*/}