Hadoop Map-Reduce OutputFormat for assigning result to in-memory variable (not files)? Hadoop Map-Reduce OutputFormat for assigning result to in-memory variable (not files)? hadoop hadoop

Hadoop Map-Reduce OutputFormat for assigning result to in-memory variable (not files)?


The problem with this idea is that Hadoop has no notion of "distributed memory". If you want the result "in memory" the next question has to be "which machine's memory?" If you really want to access it like that, you're going to have to write your own custom output format, and then also either use some existing framework for sharing memory across machines, or again, write your own.

My suggestion would be to simply write to HDFS as normal, and then for the non-MapReduce business logic just start by reading the data from HDFS via the FileSystem API, i.e.:

FileSystem fs = new JobClient(conf).getFs();Path outputPath = new Path("/foo/bar");FSDataInputStream in = fs.open(outputPath);// read data and store in memoryfs.delete(outputPath, true);

Sure, it does some unnecessary disk reads and writes, but if your data is small enough to fit in-memory, why are you worried about it anyway? I'd be surprised if that was a serious bottleneck.