How to directly send the output of a mapper-reducer to a another mapper-reducer without saving the output into the hdfs How to directly send the output of a mapper-reducer to a another mapper-reducer without saving the output into the hdfs hadoop hadoop

How to directly send the output of a mapper-reducer to a another mapper-reducer without saving the output into the hdfs


You need to explicitly configure the output of the first job to use the SequenceFileOutputFormat and define the output key and value classes:

job.setOutputFormat(SequenceFileOutputFormat.class);job.setOutputKeyClass(VarLongWritable.class);job.setOutputKeyClass(VectorWritable.class);

Without seeing your driver code, i'm guessing you're using TextOutputFormat as the output, of the first job, and TextInputFormat as the input to the second - and this input format sends pairs of <Text, Text> to the second mapper


I am a beginner in hadoop, it's just my guess of the answer, so please bear with it/point out if it seems to be naive.

I think it's not reasonable to send from reducer to next mapper without saving on HDFS.Because "which split of data go to which mapper" is elegantly designed to meet the locality criteria.(goes to mapper node which have data stored locally).

If you don't store it on HDFS, most likely that all the data will be transmitted by network which is slow and may cause bandwidth problem.


You have to temporarily save output of first map-reduce so that the 2nd one can use it.

This might help you to understand how the output of first map-reduce is passed to 2nd one. (this is based on the Generator.java of Apache nutch).

This is the temporary dir for the output of first map-reduce:

Path tempDir =  new Path(getConf().get("mapred.temp.dir", ".")           + "/job1-temp-"           + Integer.toString(new Random().nextInt(Integer.MAX_VALUE)));

Setting up first map-reduce job:

JobConf job1 = getConf();job1.setJobName("job 1");FileInputFormat.addInputPath(...);sortJob.setMapperClass(...);FileOutputFormat.setOutputPath(job1, tempDir);job1.setOutputFormat(SequenceFileOutputFormat.class);job1.setOutputKeyClass(Text.class);job1.setOutputValueClass(...);JobClient.runJob(job1);

Observe that the output dir is set in the job configuration. Use this in the 2nd job:

JobConf job2 = getConf();FileInputFormat.addInputPath(job2, tempDir);job2.setReducerClass(...);JobClient.runJob(job2);

Remember to clean-up the temp dirs after your are done:

// clean upFileSystem fs = FileSystem.get(getConf());fs.delete(tempDir, true);

Hope this helps.