Split reduced data into output and new input in Hadoop Split reduced data into output and new input in Hadoop hadoop hadoop

Split reduced data into output and new input in Hadoop


Try using MultipleOutputs. As it's Javadoc suggests:

The MultipleOutputs class simplifies writing output data to multiple outputs

Case one: writing to additional outputs other than the job default output. Each additional output, or named output, may be configured with its own OutputFormat, with its own key class and with its own value class.

Case two: to write data to different files provided by user

Usage pattern for job submission:

Job job = new Job(); FileInputFormat.setInputPath(job, inDir); FileOutputFormat.setOutputPath(job, outDir); job.setMapperClass(MOMap.class); job.setReducerClass(MOReduce.class); ... // Defines additional single text based output 'text' for the job MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class, LongWritable.class, Text.class); // Defines additional sequence-file based output 'sequence' for the job MultipleOutputs.addNamedOutput(job, "seq",   SequenceFileOutputFormat.class,   LongWritable.class, Text.class); ... job.waitForCompletion(true); ...

Usage in Reducer:

 String generateFileName(K k, V v) {   return k.toString() + "_" + v.toString(); } public class MOReduce extends   Reducer<WritableComparable, Writable,WritableComparable, Writable> { private MultipleOutputs mos; public void setup(Context context) { ... mos = new MultipleOutputs(context); } public void reduce(WritableComparable key, Iterator<Writable> values, Context context) throws IOException { ... mos.write("text", , key, new Text("Hello")); mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a"); mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b"); mos.write(key, new Text("value"), generateFileName(key, new Text("value"))); ... } public void cleanup(Context) throws IOException { mos.close(); ... } }