How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop? How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop? elasticsearch elasticsearch

How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop?


Spark uses the hadoop-common library for file access, so whatever file systems Hadoop supports will work with Spark. I've used it with HDFS, S3 and GCS.

I'm not sure I understand why you don't just use elasticsearch-hadoop. You have two ES clusters, so you need to access them with different configurations. sc.newAPIHadoopFile and rdd.saveAsHadoopFile take hadoop.conf.Configuration arguments. So you can without any problems use two ES clusters with the same SparkContext.