Hadoop Map-Reduce OutputFormat for assigning result to in-memory variable (not files)?

hadoop io mapreduce distributed-objects

The problem with this idea is that Hadoop has no notion of "distributed memory". If you want the result "in memory" the next question has to be "which machine's memory?" If you really want to access it like that, you're going to have to write your own custom output format, and then also either use some existing framework for sharing memory across machines, or again, write your own.

My suggestion would be to simply write to HDFS as normal, and then for the non-MapReduce business logic just start by reading the data from HDFS via the FileSystem API, i.e.:

FileSystem fs = new JobClient(conf).getFs();Path outputPath = new Path("/foo/bar");FSDataInputStream in = fs.open(outputPath);// read data and store in memoryfs.delete(outputPath, true);

Sure, it does some unnecessary disk reads and writes, but if your data is small enough to fit in-memory, why are you worried about it anyway? I'd be surprised if that was a serious bottleneck.

CodeHunter

Hadoop Map-Reduce OutputFormat for assigning result to in-memory variable (not files)?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last