How to pass object to Mapper and reducers How to pass object to Mapper and reducers hadoop hadoop

How to pass object to Mapper and reducers


You want to use the setClass() in the Configuration as you can see here. You can, then instantiate your class by newInstance(). Remember to have the instantiating made in the setup() method of the mapper/reducer, so that you don't instantiate the filter every time map/reduce methods are invoked. Good luck.

--Edit. I should add that you have access to the configuration through context, and that is how you will get the class you need. There is a getClass() method in the configuration api.


Serialize FieldFilter and put it in HDFS and later read it in the mapper/reducer functions using the HDFS API. If you have a large cluster, then you might want to increase the replication factor which is defaulted to 3 for the serialized FieldFilter class, since a larger number of mapper and reader tasks would be reading the serialized FieldFilter class.

If new MapReduce API is used then the serialized FieldFilter file can be read in Mapper.setup() function. This is called during the initialization of the map task. Could not find something similar for the old MapReduce API.

You can also consider using DistributedCache to distribute the serialized FieldFilter class to the different nodes.