How do I create a Spark RDD from Accumulo 1.6 in spark-notebook? How do I create a Spark RDD from Accumulo 1.6 in spark-notebook? hadoop hadoop

How do I create a Spark RDD from Accumulo 1.6 in spark-notebook?


Generally with custom Hadoop InputFormats, the information is specified using a JobConf. As @Sietse pointed out there are some static methods on the AccumuloInputFormat that you can use to configure the JobConf. In this case I think what you would want to do is:

val jobConf = new JobConf() // Create a job conf// Configure the job conf with our accumulo propertiesAccumuloInputFormat.setConnectorInfo(jobConf, principal, token)AccumuloInputFormat.setScanAuthorizations(jobConf, authorizations)val clientConfig =  new ClientConfiguration().withInstance(instanceName).withZkHosts(zooKeepers)AccumuloInputFormat.setZooKeeperInstance(jobConf, clientConfig)AccumuloInputFormat.setInputTableName(jobConf, tableName)// Create an RDD using the jobConfval rdd2 = sc.newAPIHadoopRDD(jobConf, classOf[org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat], classOf[org.apache.accumulo.core.data.Key], classOf[org.apache.accumulo.core.data.Value])

Note: After digging into the code, it seems the the is configured property is set based in part on the class which is called (makes sense to avoid conflicts with other packages potentially), so when we go and get it back in the concrete class later it fails to find the is configured flag. The solution to this is to not use the Abstract classes. see https://github.com/apache/accumulo/blob/bf102d0711103e903afa0589500f5796ad51c366/core/src/main/java/org/apache/accumulo/core/client/mapreduce/lib/impl/ConfiguratorBase.java#L127 for the implementation details). If you can't call this method on the concrete implementation with spark-notebook probably using spark-shell or a regularly built application is the easiest solution.


It looks like those parameters have to be set through static methods : http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/mapred/AccumuloInputFormat.html. So try setting the non-optional parameters and run again. It should work.