Connecting to Accumulo inside a Mapper using Kerberos Connecting to Accumulo inside a Mapper using Kerberos hadoop hadoop

Connecting to Accumulo inside a Mapper using Kerberos


The provided AccumuloInputFormat and AccumuloOutputFormat have a method to set the token in the job configuration with the Accumulo*putFormat.setConnectorInfo(job, principle, token). You can also serialize the token in a file in HDFS, using the AuthenticationTokenSerializer and use the version of the setConnectorInfo method which accepts a file name.

If a KerberosToken is passed in, the job will create a DelegationToken to use, and if a DelegationToken is passed in, it will just use that.

The provided AccumuloInputFormat should handle its own scanner, so normally, you shouldn't have to do that in your Mapper if you've set the configuration properly. However, if you're doing secondary scanning (for something like a join) inside your Mapper, you can inspect the provided AccumuloInputFormat's RecordReader source code for an example of how to retrieve the configuration and construct a Scanner.