Debug MapReduce (of Hadoop 2.2 or higher) in Eclipse

eclipse debugging hadoop mapreduce

You can debug in same way.You just run you MapReduce code in standalone mode and use eclipse to debug MR code like any Java code.

eclipse debugging hadoop mapreduce

Here are the steps I setup in Eclipse. Environment: Ubuntu 16.04.2, Eclipse Neon.3 Release (4.6.3RC2), jdk1.8.0_121. I did a fresh hadoop-2.7.3 installation under /j01/srv/hadoop, which is my $HADOOP_HOME. Replace $HADOOP_HOME value with your actual path wherever referenced below. For hadoop running from Eclipse, you do not need to do any hadoop configurations, what really needed is to pull the right set of hadoop jars into Eclipse.

Step 1 Create new Java Project
File > New > Project...
Select Java Project, Next

Enter Project name: hadoopmr

Click Configure default...

Source folder name: src/main/java
Output folder name: target/classes
Click Apply, OK, then Next
Click tab Libraries

Click Add External JARs...
Browse to hadoop installation folder, and add the following jars, when done click Finish

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.3.jar$HADOOP_HOME/share/hadoop/common/hadoop-nfs-2.7.3.jar$HADOOP_HOME/share/hadoop/common/lib/avro-1.7.4.jar$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar$HADOOP_HOME/share/hadoop/common/lib/commons-collections-3.2.2.jar$HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar$HADOOP_HOME/share/hadoop/common/lib/commons-io-2.4.jar$HADOOP_HOME/share/hadoop/common/lib/commons-lang-2.6.jar$HADOOP_HOME/share/hadoop/common/lib/commons-logging-1.1.3.jar$HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.7.3.jar$HADOOP_HOME/share/hadoop/common/lib/httpclient-4.2.5.jar$HADOOP_HOME/share/hadoop/common/lib/httpcore-4.2.5.jar$HADOOP_HOME/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar$HADOOP_HOME/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar$HADOOP_HOME/share/hadoop/common/lib/log4j-1.2.17.jar$HADOOP_HOME/share/hadoop/common/lib/slf4j-api-1.7.10.jar$HADOOP_HOME/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar$HADOOP_HOME/share/hadoop/mapreduce/lib-examples/hsqldb-2.0.0.jar$HADOOP_HOME/share/hadoop/tools/lib/guava-11.0.2.jar$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar

Step 2 Create a MapReduce example
Create a new package: org.apache.hadoop.examples
Create WordCount.java under package org.apache.hadoop.examples with the following contents:

/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements.  See the NOTICE file * distributed with this work for additional information * regarding copyright ownership.  The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License.  You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {  public static class TokenizerMapper        extends Mapper<Object, Text, Text, IntWritable>{    private final static IntWritable one = new IntWritable(1);    private Text word = new Text();    public void map(Object key, Text value, Context context                    ) throws IOException, InterruptedException {      StringTokenizer itr = new StringTokenizer(value.toString());      while (itr.hasMoreTokens()) {        word.set(itr.nextToken());        context.write(word, one);      }    }  }  public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {    private IntWritable result = new IntWritable();    public void reduce(Text key, Iterable<IntWritable> values,                        Context context                       ) throws IOException, InterruptedException {      int sum = 0;      for (IntWritable val : values) {        sum += val.get();      }      result.set(sum);      context.write(key, result);    }  }  public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length < 2) {      System.err.println("Usage: wordcount <in> [<in>...] <out>");      System.exit(2);    }    Job job = Job.getInstance(conf, "word count");    job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    for (int i = 0; i < otherArgs.length - 1; ++i) {      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));    }    FileOutputFormat.setOutputPath(job,      new Path(otherArgs[otherArgs.length - 1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }}

Create input.txt under /home/hadoop/input/ (or your path) with the following contents:

What do you mean by ObjectWhat is Java Virtual MachineHow to create Java ObjectHow Java enabled High Performance

Step 3 Setup Debug Configuration
In Eclipse, open WordCount.java, set breakpoints in places you like.
Right click on WordCount.java, Debug As > Debug Configurations...
Select Java Application, click New launch configuration on top-left icon

Enter org.apache.hadoop.examples.WordCount in Main class box
Click Arguments tab

enter

/home/hadoop/input/input.txt /home/hadoop/output

into Program arguments
Click Apply, then Debug
Program starts along with hadoop, it should hit the breakpoints you set.

Check results at

ls -l /home/hadoop/output-rw-r--r-- 1 hadoop hadoop 131 Apr  5 22:59 part-r-00000-rw-r--r-- 1 hadoop hadoop   0 Apr  5 22:59 _SUCCESS

Notes:

1) If program does not run, make sure Project > Build Automatically is checked.
Project > Clean… to force a build

2) You can get more examples from

jar xvf $HADOOP_HOME/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar

Copy them into this project to continue explore

3) You can download this eclipse project from

git clone https://github.com/drachenrio/hadoopmr

In Eclipse, File > Import... > Existing Projects into Workspace > Next
Browse to cloned project and import it
Open .classpath, replace /j01/srv/hadoop-2.7.3 with your hadoop installation home

CodeHunter

Debug MapReduce (of Hadoop 2.2 or higher) in Eclipse

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last