Is Mapper Object of Hadoop Shared across Multiple Threads?

multithreading hadoop

As long as you are not using the MultithreadedMapper class, but your own, there will be no problem. map() is called sequential and not in parallel.

It is common to use a StringBuilder or other data structures to buffer a few objects between the calls.But make sure you clone the objects from your input objects, there is only one object and it will be filled over and over again to prevent lots of GC.

So there is no need to synchronize or take care of race conditions.

multithreading hadoop

I don't think that's possible. The reason for that is that each mapper runs in its own JVM (they will be distributed on different machines), so there's no way you can share a variable or object across multiple mappers or reducers easily.

Now if all your mappers run on the same node, I believe there is a configuration for JVM reuse somewhere, but honestly I wouldn't bother with that, especially if all you need is a StringBuilder :)

I've seen this question once before, and it could be solved very easily by changing the design of the application. Maybe you can tell more about what you're trying to accomplish with this to see if this is really needed. If you really need it, you can still serialize your object, put it in HDFS, then read it with each mapper, deserialize it, but that seems backwards.

CodeHunter

Is Mapper Object of Hadoop Shared across Multiple Threads?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last