How to directly send the output of a mapper-reducer to a another mapper-reducer without saving the output into the hdfs

hadoop mahout

You need to explicitly configure the output of the first job to use the SequenceFileOutputFormat and define the output key and value classes:

job.setOutputFormat(SequenceFileOutputFormat.class);job.setOutputKeyClass(VarLongWritable.class);job.setOutputKeyClass(VectorWritable.class);

Without seeing your driver code, i'm guessing you're using TextOutputFormat as the output, of the first job, and TextInputFormat as the input to the second - and this input format sends pairs of <Text, Text> to the second mapper

hadoop mahout

I am a beginner in hadoop, it's just my guess of the answer, so please bear with it/point out if it seems to be naive.

I think it's not reasonable to send from reducer to next mapper without saving on HDFS.Because "which split of data go to which mapper" is elegantly designed to meet the locality criteria.(goes to mapper node which have data stored locally).

If you don't store it on HDFS, most likely that all the data will be transmitted by network which is slow and may cause bandwidth problem.

hadoop mahout

You have to temporarily save output of first map-reduce so that the 2nd one can use it.

This might help you to understand how the output of first map-reduce is passed to 2nd one. (this is based on the Generator.java of Apache nutch).

This is the temporary dir for the output of first map-reduce:

Path tempDir =  new Path(getConf().get("mapred.temp.dir", ".")           + "/job1-temp-"           + Integer.toString(new Random().nextInt(Integer.MAX_VALUE)));

Setting up first map-reduce job:

JobConf job1 = getConf();job1.setJobName("job 1");FileInputFormat.addInputPath(...);sortJob.setMapperClass(...);FileOutputFormat.setOutputPath(job1, tempDir);job1.setOutputFormat(SequenceFileOutputFormat.class);job1.setOutputKeyClass(Text.class);job1.setOutputValueClass(...);JobClient.runJob(job1);

Observe that the output dir is set in the job configuration. Use this in the 2nd job:

JobConf job2 = getConf();FileInputFormat.addInputPath(job2, tempDir);job2.setReducerClass(...);JobClient.runJob(job2);

Remember to clean-up the temp dirs after your are done:

// clean upFileSystem fs = FileSystem.get(getConf());fs.delete(tempDir, true);

Hope this helps.

CodeHunter

How to directly send the output of a mapper-reducer to a another mapper-reducer without saving the output into the hdfs

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last