Propagating custom configuration values in Hadoop Propagating custom configuration values in Hadoop hadoop hadoop

Propagating custom configuration values in Hadoop


Unless I'm missing something, if you have a Properties object containing every property you need in your M/R job, you simply need to write the content of the Properties object to the Hadoop Configuration object. For example, something like this:

Configuration conf = new Configuration();Properties params = getParameters(); // do whatever you need here to create your objectfor (Entry<Object, Object> entry : params.entrySet()) {    String propName = (String)entry.getKey();    String propValue = (String)entry.getValue();    conf.set(propName, propValue);}

Then inside your M/R job, you can use the Context object to get back your Configuration in both the mapper (the map function) or the reducer (the reduce function), like this:

public void map(MD5Hash key, OverlapDataWritable value, Context context)    Configuration conf = context.getConfiguration();    String someProperty = conf.get("something");    ....}

Note that when using the Configuration object, you can also access the Context in the setup and cleanup methods, useful to do some initialization if needed.

Also it's worth mentioning you could probably directly call the addResource method from the Configuration object to add your properties directly as an InputStream or a file, but I believe this has to be an XML configuration like the regular Hadoop XML configs, so that might just be overkill.

EDIT: In case of non-String objects, I would advise using serialization: You can serialize your objects, and then convert them to Strings (probably encode them for example with Base64 as I'm not sure what would happen if you have unusual characters), and then on the mapper/reducer side de-serialize the objects from the Strings you get from the properties inside Configuration.

Another approach would be to do the same serialization technique, but instead write to HDFS, and then add these files to the DistributedCache. Sounds a bit overkill, but this would probably work.