Debug MapReduce (of Hadoop 2.2 or higher) in Eclipse
Here are the steps I setup in Eclipse. Environment: Ubuntu 16.04.2, Eclipse Neon.3 Release (4.6.3RC2), jdk1.8.0_121. I did a fresh hadoop-2.7.3 installation under /j01/srv/hadoop, which is my $HADOOP_HOME. Replace $HADOOP_HOME value with your actual path wherever referenced below. For hadoop running from Eclipse, you do not need to do any hadoop configurations, what really needed is to pull the right set of hadoop jars into Eclipse.
Step 1 Create new Java Project
File > New > Project...
Select Java Project, Next
Enter Project name: hadoopmr
Click Configure default...
Source folder name: src/main/java
Output folder name: target/classes
Click Apply, OK, then Next
Click tab Libraries
Click Add External JARs...
Browse to hadoop installation folder, and add the following jars, when done click Finish
$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.3.jar$HADOOP_HOME/share/hadoop/common/hadoop-nfs-2.7.3.jar$HADOOP_HOME/share/hadoop/common/lib/avro-1.7.4.jar$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar$HADOOP_HOME/share/hadoop/common/lib/commons-collections-3.2.2.jar$HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar$HADOOP_HOME/share/hadoop/common/lib/commons-io-2.4.jar$HADOOP_HOME/share/hadoop/common/lib/commons-lang-2.6.jar$HADOOP_HOME/share/hadoop/common/lib/commons-logging-1.1.3.jar$HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.7.3.jar$HADOOP_HOME/share/hadoop/common/lib/httpclient-4.2.5.jar$HADOOP_HOME/share/hadoop/common/lib/httpcore-4.2.5.jar$HADOOP_HOME/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar$HADOOP_HOME/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar$HADOOP_HOME/share/hadoop/common/lib/log4j-1.2.17.jar$HADOOP_HOME/share/hadoop/common/lib/slf4j-api-1.7.10.jar$HADOOP_HOME/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/lib-examples/hsqldb-2.0.0.jar$HADOOP_HOME/share/hadoop/tools/lib/guava-11.0.2.jar$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar
Step 2 Create a MapReduce example
Create a new package: org.apache.hadoop.examples
Create WordCount.java under package org.apache.hadoop.examples with the following contents:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { System.err.println("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}
Create input.txt under /home/hadoop/input/ (or your path) with the following contents:
What do you mean by ObjectWhat is Java Virtual MachineHow to create Java ObjectHow Java enabled High Performance
Step 3 Setup Debug Configuration
In Eclipse, open WordCount.java, set breakpoints in places you like.
Right click on WordCount.java, Debug As > Debug Configurations...
Select Java Application, click New launch configuration on top-left icon
Enter org.apache.hadoop.examples.WordCount in Main class box
Click Arguments tab
enter
/home/hadoop/input/input.txt /home/hadoop/output
into Program arguments
Click Apply, then Debug
Program starts along with hadoop, it should hit the breakpoints you set.
Check results at
ls -l /home/hadoop/output-rw-r--r-- 1 hadoop hadoop 131 Apr 5 22:59 part-r-00000-rw-r--r-- 1 hadoop hadoop 0 Apr 5 22:59 _SUCCESS
Notes:
1) If program does not run, make sure Project > Build Automatically is checked.
Project > Clean… to force a build
2) You can get more examples from
jar xvf $HADOOP_HOME/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar
Copy them into this project to continue explore
3) You can download this eclipse project from
git clone https://github.com/drachenrio/hadoopmr
In Eclipse, File > Import... > Existing Projects into Workspace > Next
Browse to cloned project and import it
Open .classpath, replace /j01/srv/hadoop-2.7.3 with your hadoop installation home