Spark - ElasticSearch Index creation performance too slow
Don't pass data through driver unless it is necessary. Depending on what is the source of data returned from
getData
you should use relevant input method or create your own. If data comes from MongoDB use for examplemongo-hadoop
,Spark-MongoDB
or Drill with JDBC connection. Then usemap
or similar method to build the required objects and usesaveToEs
on transformed RDD.Creating a RDD with as single element doesn't make sense. It doesn't benefit from Spark architecture at all. You just start a potentially huge number of tasks which have nothing with only a single active executor.