Elasticsearch spark reading slow
This is not how slow it is supposed to be, and the answer could be found in the screenshot you shared:
The column Stages: Succeeded/Total
in Spark UI shows only one task that runs the read operation, I don't think that this is what you would expect, otherwise, what's the point of having a whole cluster.
I have faced the same problem and it took me a while to figure out that Spark associates a task (partition) to each shard in the Elasticsearch index,
There we have our answer, to go faster we should parallelise the process, how to do so ? well by distributing our source index into multiple shards.
By default, Elasticsearch creates an Index with one shard, though, it is possible to personalised it as below:
PUT /index-name{ "settings": { "index": { "number_of_shards": x, "number_of_replicas": xx } }}
The number of shards could be higher than the number of Elastic nodes, this is all transparent to Spark.If the index already exists, try creating a new inex and then use the Elasticsearch Reindex API
I hope this solved your problem.