What is the ideal bulk size formula in ElasticSearch?
There is no golden rule for this. Extracted from the doc:
There is no “correct” number of actions to perform in a single bulk call. You should experiment with different settings to find the optimum size for your particular workload.
Read ES bulk API doc carefully: https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#_using_and_sizing_bulk_requests
- Try with 1 KiB, try with 20 KiB, then with 10 KiB, ... dichotomy
- Use bulk size in KiB (or equivalent), not document count !
- Send data in bulk (no streaming), pass redundant info API url if you can
- Remove superfluous whitespace in your data if possible
- Disable search index updates, activate it back later
- Round-robin across all your data nodes
I derived this information from the Java API's BulkProcessor class. It defaults to 1000 actions or 5MB, it also allows you to set a flush interval but this is not set by default. I'm just using the default settings.
I'd suggest using BulkProcessor if you are using the Java API.