Is Mapper Object of Hadoop Shared across Multiple Threads? Is Mapper Object of Hadoop Shared across Multiple Threads? hadoop hadoop

Is Mapper Object of Hadoop Shared across Multiple Threads?


As long as you are not using the MultithreadedMapper class, but your own, there will be no problem. map() is called sequential and not in parallel.

It is common to use a StringBuilder or other data structures to buffer a few objects between the calls.But make sure you clone the objects from your input objects, there is only one object and it will be filled over and over again to prevent lots of GC.

So there is no need to synchronize or take care of race conditions.


I don't think that's possible. The reason for that is that each mapper runs in its own JVM (they will be distributed on different machines), so there's no way you can share a variable or object across multiple mappers or reducers easily.

Now if all your mappers run on the same node, I believe there is a configuration for JVM reuse somewhere, but honestly I wouldn't bother with that, especially if all you need is a StringBuilder :)

I've seen this question once before, and it could be solved very easily by changing the design of the application. Maybe you can tell more about what you're trying to accomplish with this to see if this is really needed. If you really need it, you can still serialize your object, put it in HDFS, then read it with each mapper, deserialize it, but that seems backwards.