writing rdd from spark to Elastic Search fails writing rdd from spark to Elastic Search fails elasticsearch elasticsearch

writing rdd from spark to Elastic Search fails


It looks like problem with pyspark calculations, not necessarly elasticsearch saving process. Ensure your RDDs are OK by:

  1. Performing count() on rdd1 (to "materialize" results)
  2. Performing count() on rdd2

If counts are OK, try with caching results before saving into ES:

res2.cache()res2.count() # to fill the cacheres2.saveAsNewAPIHadoopFile(...

It the problem still appears, try to look at dead executors stderr and stdout (you can find them on Executors tab in SparkUI).

I also noticed the very small batch size in es_write_conf, try increasing it to 500 or 1000 to get better performance.