MapReduce - How sort reduce output by value
Best way to do it is to use the output of your first MapReduce job as the input of another job, which I call Sort.java. Since the Hadoop Map function has a sorting algorithm in place, you don't even need a reduce class. Just do something like this:
public static class Map extends Mapper<LongWritable,Text,IntWritable,Text>{ private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IO Exception, Interrupted Exception{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); word.set(tokenizer.nextToken()); IntWritable number = new IntWritable(Integer.parseInt(tokenizer.nextToken())); context.write(number,word); } }
That will sort your [LongWritable,text] output of your first MapReduce by the LongWritable value. Let me know how it works!
CL
Per the docs, Reducer output is not re-sorted. Either sort the input to the reducer (if that works for your application) by setting an appropriate value for JobConf.setOutputValueGroupingComparator(Class), or just sort the final output from the reducer in a separate step.