Hadoop Mapreduce multiple Input files

java command-line hadoop mapreduce command-line-arguments

Based on the stacktrace, your output directory is not empty. So the simplest thing is actually to delete it before running the job:

bin/hadoop fs -rmr /user/cloudera/capital/output

Besides that, your arguments starting with the classname of your main class org.myorg.Capital. So that is the argument on the zero'th index. (Based on the stacktrace and the code you have provided).

Basically you need to shift all your indices one to the right:

Path cityInputPath = new Path(args[1]);Path countryInputPath = new Path(args[2]);Path outputPath = new Path(args[3]);MultipleInputs.addInputPath(job, countryInputPath, TextInputFormat.class, JoinCountryMapper.class);MultipleInputs.addInputPath(job, cityInputPath, TextInputFormat.class, JoinCityMapper.class);FileOutputFormat.setOutputPath(job, outputPath);

Don't forget to clear your output folder though!

Also a small tip for you, you can separate the files with comma "," so you can set them with a single call like this:

hadoop jar capital.jar org.myorg.Capital /user/cloudera/capital/input/City.dat,/user/cloudera/capital/input/Country.dat

And in your java code:

FileInputFormat.addInputPaths(job, args[1]);

java command-line hadoop mapreduce command-line-arguments

What is happening here is that the class name is deemed to be the first argument!

By default, the first non-option argument is the name of the class to be invoked. A fully-qualified class name should be used. If the -jar option is specified, the first non-option argument is the name of a JAR archive containing class and resource f iles for the application, with the startup class indicated by the Main-Class manifest header.

So What I would suggest that you add a Manifest files to your jar where in you specify the main class. Your MANIFEST.MF files may look like:

Manifest-Version: 1.0Main-Class: org.myorg.Capital

And now your command would look like:

hadoop jar capital.jar /user/cloudera/capital/input/City.dat /user/cloudera/capital/input/Country.dat /user/cloudera/capital/output

You can certainly just change the index values being used in your code but that's not advisable solution.

java command-line hadoop mapreduce command-line-arguments

can you try this:

hadoop jar capital.jar /user/cloudera/capital/input /user/cloudera/capital/output

This should read all files in the single input directory.

CodeHunter

Hadoop Mapreduce multiple Input files

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last