running hadoop wordCount example with groovy

java hadoop groovy hadoop2 hadoop-streaming

I was able to run this groovy file with hadoop 2.7.1 The procedure I followed is

Install gradle
Generate jar file using gradle. I asked this question which helped me build dependencies in gradle
Run with hadoop as usual as we run a java jar file using this command from the folder where jar is located.
hadoop jar buildSrc-1.0.jar in1 out4

where in1 is input file and out4 is the output folder in hdfs

EDIT- As the above link is broken , I am pasting the groovy file here.

import StartsWithCountMapperimport StartsWithCountReducerimport org.apache.hadoop.conf.Configuredimport org.apache.hadoop.fs.Pathimport org.apache.hadoop.io.IntWritableimport org.apache.hadoop.io.LongWritableimport org.apache.hadoop.io.Textimport org.apache.hadoop.mapreduce.Jobimport org.apache.hadoop.mapreduce.Mapperimport org.apache.hadoop.mapreduce.Reducerimport org.apache.hadoop.mapreduce.lib.input.TextInputFormatimport org.apache.hadoop.mapreduce.lib.output.TextOutputFormatimport org.apache.hadoop.util.Toolimport org.apache.hadoop.util.ToolRunnerclass CountGroovyJob extends Configured implements Tool {    @Override    int run(String[] args) throws Exception {        Job job = Job.getInstance(getConf(), "StartsWithCount")        job.setJarByClass(getClass())        // configure output and input source        TextInputFormat.addInputPath(job, new Path(args[0]))        job.setInputFormatClass(TextInputFormat)        // configure mapper and reducer        job.setMapperClass(StartsWithCountMapper)        job.setCombinerClass(StartsWithCountReducer)        job.setReducerClass(StartsWithCountReducer)        // configure output        TextOutputFormat.setOutputPath(job, new Path(args[1]))        job.setOutputFormatClass(TextOutputFormat)        job.setOutputKeyClass(Text)        job.setOutputValueClass(IntWritable)        return job.waitForCompletion(true) ? 0 : 1    }    static void main(String[] args) throws Exception {        System.exit(ToolRunner.run(new CountGroovyJob(), args))    }    class GroovyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {        private final static IntWritable countOne = new IntWritable(1);        private final Text reusableText = new Text();        @Override        protected void map(LongWritable key, Text value, Mapper.Context context) {            value.toString().tokenize().each {                reusableText.set(it)                context.write(reusableText,countOne)            }        }    }    class GroovyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{        private IntWritable outValue = new IntWritable();        @Override        protected void reduce(Text key, Iterable<IntWritable> values, Reducer.Context context) {            outValue.set(values.collect({it.value}).sum())            context.write(key, outValue);        }    }}

java hadoop groovy hadoop2 hadoop-streaming

The library you are using, groovy-hadoop, says it supports Hadoop 0.20.2. It's really old.

But the CountGroovyJob.groovy code you are trying to run looks like it's supposed to run on versions 2.x.x of Hadoop. I can see this because in the imports you see packages such as org.apache.hadoop.mapreduce.Mapper, whereas before version 2, it was called org.apache.hadoop.mapred.Mapper.

The most voted answer in the SO question you linked is probably the answer you needed. You have an incompatibility problem. The groovy-hadoop library can't work with your Hadoop 2.7.1.

CodeHunter

running hadoop wordCount example with groovy

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last