Spark - ElasticSearch Index creation performance too slow Spark - ElasticSearch Index creation performance too slow hadoop hadoop

Spark - ElasticSearch Index creation performance too slow


  1. Don't pass data through driver unless it is necessary. Depending on what is the source of data returned from getData you should use relevant input method or create your own. If data comes from MongoDB use for example mongo-hadoop, Spark-MongoDB or Drill with JDBC connection. Then use map or similar method to build the required objects and use saveToEs on transformed RDD.

  2. Creating a RDD with as single element doesn't make sense. It doesn't benefit from Spark architecture at all. You just start a potentially huge number of tasks which have nothing with only a single active executor.