Share large object between internal thread-mappers of MultithreadedMapper class in Hadoop Mapreduce? Share large object between internal thread-mappers of MultithreadedMapper class in Hadoop Mapreduce? hadoop hadoop

Share large object between internal thread-mappers of MultithreadedMapper class in Hadoop Mapreduce?


Preamble: I've never done that before, but I would implement it using a static lock for initialization:

static class MySingleThreadMapper extends Mapper<LongWritable, Text, Text, Text> {    static MyResource sharedResource;    @Override    protected void setup(Context context) throws IOException, InterruptedException {        super.setup(context);        synchronized (MySingleThreadMapper.class) {             if (sharedResource == null) {                 sharedResource = createResource();             }        }    }    @Override    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {       // mapper code       // sharedResource will be initialized here    }}

As you may already know, Hadoop spawns its Map and Reduce tasks in separate JVM instances. So all your single thread mappers will run in the same JVM and you can rely on a static lock. You can use any other static object as a lock, your shared resource will be initialized only once.