sorting in map reduce sorting in map reduce hadoop hadoop

sorting in map reduce


The keys will be sorted as they come into the reduce phase, the values in the given value sets will not be sorted.
There is no guarantee for order in the values passed into the reducer, that is not how Hadoop works.

Your problem is (as you say) a 'simple problem' [in many other different frameworks and paradigms]. This problem is not an easy (or appropriate) problem for map reduce.


A solution to your situation is to have more complex keys to make sure the output is in the order you want initially, or to pass the output through a secondary sort map reduce job creating composite keys from the key and individual values.


Order of value is not guaranteed for reduce input.

You can do the sort using 2nd MapReduce programor You can use comparator. Here is a nice blog addressing the casehttps://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/


You can construct a value that contains also the column index.

public class ColumnValue implements Writable{    public double column;    public double value;    public PartialWritablePhase1(long column, double value){        this.column = column;        this.value = value;         }    @Override    public void readFields(DataInput in) throws IOException {        this.column = in.readLong();        this.value = in.readDouble();    }    @Override    public void write(DataOutput out) throws IOException {        out.writeLong(column);        out.writeDouble(value);    }    @Override    public String toString() {            return column+" "+value;}}

You then can use it in the reducer as such

public void reduce(LongWritable key, Iterable<ColumnValue> values, Context context)            throws IOException, InterruptedException {        for (ColumnVal val : values) {            //Store values of column in OrderedByColumn an ordered tree set by column           // or any structure you want        }        Iterator<ColumnValue> keySetIterator = OrderedByColumn.iterator();        while(keySetIterator.hasNext()){          context.write(new LongWritable(key.get()), keySetIterator.next());        }    }