How to prevent Elasticsearch from index throttling? How to prevent Elasticsearch from index throttling? elasticsearch elasticsearch

How to prevent Elasticsearch from index throttling?


The setting that actually corresponds to the maxNumMerges in the log file is called index.merge.scheduler.max_merge_count. Increasing this along with index.merge.scheduler.max_thread_count (where max_thread_count <= max_merge_count) will increase the number of simultaneous merges which are allowed for segments within an individual index's shards.

If you have a very high indexing rate that results in many GBs in a single index, you probably want to raise some of the other assumptions that the Elasticsearch default settings make about segment size, too. Try raising the floor_segment - the minimum size before a segment will be considered for merging, the max_merged_segment - the maximum size of a single segment, and the segments_per_tier -- the number of segments of roughly equivalent size before they start getting merged into a new tier. On an application that has a high indexing rate and finished index sizes of roughly 120GB with 10 shards per index, we use the following settings:

curl -XPUT /index_name/_settings -d'{  "settings": {    "index.merge.policy.max_merge_at_once": 10,    "index.merge.scheduler.max_thread_count": 10,    "index.merge.scheduler.max_merge_count": 10,    "index.merge.policy.floor_segment": "100mb",    "index.merge.policy.segments_per_tier": 25,    "index.merge.policy.max_merged_segment": "10gb"  }}

Also, one important thing you can do to improve loss-of-node/node restarted recovery time on applications with high indexing rates is taking advantage of index recovery prioritization (in ES >= 1.7). Tune this setting so that the indices that receive the most indexing activity are recovered first. As you may know, the "normal" shard initialization process just copies the already-indexed segment files between nodes. However, if indexing activity is occurring against a shard before or during initialization, the translog with the new documents can become very large. In the scenario where merging goes through the roof during recovery, it's the replay of this translog against the shard that is almost always the culprit. Thus, using index recovery prioritization to recover those shards first and delay shards with less indexing activity, you can minimize the eventual size of the translog which will dramatically improve recovery time.


We are using 1.7 and noticed a similar problem: The indexing getting throttled even when the IO was not saturated (Fusion IO in our case).

After increasing "index.merge.scheduler.max_thread_count" the problem seems to be gone -- we did not see any more throttling being logged so far.

I would try setting "index.merge.scheduler.max_thread_count" to at least the max reported numMergesInFlight (6 in the logs above).

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/index-modules-merge.html#scheduling

Hope this helps!


Have you looked into increasing the shard allocation delay to give the node time to recover before the master starts promoting replicas?

https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html