ElasticSearch - Determining maximum shard size ElasticSearch - Determining maximum shard size elasticsearch elasticsearch

ElasticSearch - Determining maximum shard size


To test this myself, I indexed all the English articles in Wikipedia (without any history information) in a single elasticsearch shard. The elasticsearch data folder grew to ~42GB at the end of the test. Lessons learned are:

  • indexing speed will not be affected by the size of the shard. Mind you, I did not try indexing with more than one thread at a time, but single thread indexing speed was more or less constant for the duration of the test
  • querying speed on the other hand was drastically affected by shard size. Especially once you try to query with more than one user at a time. The exact numbers will depend heavily on the power of your machine, data structure and how many threads are querying. To give you an idea, with elasticsearch running on my dev machine, querying the Wikipedia shard with 25 concurrent users resulted in an average response time of 3.5 seconds (with peaks towards half a minute).

My conclusion is that a shard too large will not make elasticsearch fail just from indexing. Querying the large shard may be too slow for your needs, or, in certain situations, even break elasticsearch with an OutOfMemoryException (for example a big faceted query).

This answer is based on my own investigation. Full story can be read on my blog:

http://blog.trifork.com/2013/09/26/maximum-shard-size-in-elasticsearch/
http://blog.trifork.com/2013/11/05/maximum-shard-size-in-elasticsearch-revisited/