Hadoop GenericOptionsParser Hadoop GenericOptionsParser hadoop hadoop

Hadoop GenericOptionsParser


If the jar you are using here(wordcount.jar) is hadoop-examples*.jar, then it is a runnable jar having main class org.apache.hadoop.examples.ExampleDriver

First argument is filtered out, if the example name (wordcount,teragen,terasort) which we specify is a valid option( teragen,terasort,wordcount etc.).

See the following method

org.apache.hadoop.util.ProgramDriver#driver(String[] args) 

After the initial filtering example class org.apache.hadoop.examples.WordCount will be invoked with the remaining argument(input output). org.apache.hadoop.examples.WordCount is not getting called directly.

The usage of GenericOptionsParser enables to specify Generic option in the command line itself

Eg: With Genericoption you can specify the following

hadoop jar /home/hduser/WordCount/wordcount.jar WordCount -Dmapred.reduce.tasks=20 input output


Command usage is already explained.

The functionality of GenericOptionsParser is to segregate the generic options from user command line args like input, output, other options. Hadoop offers the following generic options.

-D key=value-fs-jt-libjars-files etc....

This class is not only segregates generic options from user command line arguments but also add all these generic options to Hadoop configuration object which is created in the driver method of MR program.

We can use Tool and ToolRunner instead of GenericOptionsParser.


You are executing jar file via Hadoop Jar command. If you look at the syntax:hadoop jar [mainClass] args

So for your command jar_name = hadoop jar [mainClass] argsMainClass = WordCount {This is the name of the class that contains your main function. Please note this not the arguement. This is not an actual argument to your program but a hint that which class contains your main function.input = is your arguementoutput is also your arguement.