MapReduce - How sort reduce output by value MapReduce - How sort reduce output by value hadoop hadoop

MapReduce - How sort reduce output by value


Best way to do it is to use the output of your first MapReduce job as the input of another job, which I call Sort.java. Since the Hadoop Map function has a sorting algorithm in place, you don't even need a reduce class. Just do something like this:

public static class Map extends Mapper<LongWritable,Text,IntWritable,Text>{   private Text word = new Text();   public void map(LongWritable key, Text value, Context context) throws IO Exception, Interrupted Exception{   String line = value.toString();   StringTokenizer tokenizer = new StringTokenizer(line);   word.set(tokenizer.nextToken());   IntWritable number = new IntWritable(Integer.parseInt(tokenizer.nextToken()));   context.write(number,word);   }     }

That will sort your [LongWritable,text] output of your first MapReduce by the LongWritable value. Let me know how it works!

CL


Per the docs, Reducer output is not re-sorted. Either sort the input to the reducer (if that works for your application) by setting an appropriate value for JobConf.setOutputValueGroupingComparator(Class), or just sort the final output from the reducer in a separate step.