How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop?
Spark uses the hadoop-common library for file access, so whatever file systems Hadoop supports will work with Spark. I've used it with HDFS, S3 and GCS.
I'm not sure I understand why you don't just use elasticsearch-hadoop
. You have two ES clusters, so you need to access them with different configurations. sc.newAPIHadoopFile
and rdd.saveAsHadoopFile
take hadoop.conf.Configuration
arguments. So you can without any problems use two ES clusters with the same SparkContext
.