Update nested field for millions of documents Update nested field for millions of documents elasticsearch elasticsearch

Update nested field for millions of documents


As often with performance optimization questions, there is no single answer since there are many possible causes of poor performance.

In your case you are making bulk update requests. When an update is performed, the document is actually being re-indexed:

... to update a document is to retrieve it, change it, and then reindex the whole document.

Hence it makes sense to take a look at indexing performance tuning tips. The first few things I would consider in your case would be selecting right bulk size, using several threads for bulk requests and increasing/disabling indexing refresh interval.

You might also consider using a ready-made client that supports parallel bulk requests, like Python elasticsearch client does.

It would be ideal to monitor ElasticSearch performance metrics to understand where the bottleneck is, and if your performance tweaks are giving actual gain. Here is an overview blog post about ElasticSearch performance metrics.