ElasticSearch circuit_breaking_exception (Data too large) with significant_terms aggregation

elasticsearch significant-terms

I am not sure what you are trying to do, but I'm curious to find out. Since you get that exception, I can assume the cardinality of that field is not small. You are basically trying to see, I guess, the relationships between all the terms in that field, based on significance.

The first significant_terms aggregation will consider all the terms from that field and establish how "significant" they are (calculating frequencies of that term in the whole index and then comparing those with the frequencies from the range query set of documents).

After it's doing that (for all the terms), you want a second significant_aggregation that should do the first step, but now considering each term and doing for it another significant_aggregation. That's gonna be painful. Basically, you are computing number_of_term * number_of_terms significant_terms calculations.

The big question is what are you trying to do?

If you want to see a relationship between all the terms in that field, that's gonna be expensive for the reasons explained above. My suggestion is to run a first significant_terms aggregation, take the first 10 terms or so and then run a second query with another significant_terms aggregation but limiting the terms by probably doing a parent terms aggregation and include only those 10 from the first query.

You can, also, take a look at sampler aggregation and use that as a parent for your only one significant terms aggregation.

Also, I don't think increasing the circuit breaker limit is the real solution. Those limits were chosen with a reason. You can increase that and maybe it will work, but it has to make you ask yourself if that's the right query for your use case (as it doesn't sound like it is). That limit value that it's in the exception might not be the final one... reused_arrays refers to an array class in Elasticsearch that is resizeable, so if more elements are needed, the array size is increased and you may hit the circuit breaker again, for another value.

elasticsearch significant-terms

You can try to increase the request circuit breaker limit to 41% (default is 40%) in your elasticsearch.yml config file and restart your cluster:

indices.breaker.request.limit: 41%

Or if you prefer to not restart your cluster you can change the setting dynamically using:

curl -XPUT localhost:9200/_cluster/settings -d '{  "persistent" : {    "indices.breaker.request.limit" : "41%"   }}'

Judging by the numbers showing up (i.e. "bytes_limit": 6844055552, "bytes_wanted": 6844240272), you're just missing ~190 KB of heap, so increasing by 1% to 41% you should get 17 MB of additional heap (your total heap = ~17GB) for your request breaker which should be sufficient.

Just make sure to not increase this value too high, as you run the risk of going OOM since the request circuit breaker also shares the heap with the fielddata circuit breaker and other components.

elasticsearch significant-terms

Circuit breakers are designed to deal with situations when request processing needs more memory than available. You can set limit by using following query

PUT /_cluster/settings{  "persistent" : {    "indices.breaker.request.limit" : "45%"   }}

You can get more information on

https://www.elastic.co/guide/en/elasticsearch/reference/current/circuit-breaker.html https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-fielddata.html

CodeHunter

ElasticSearch circuit_breaking_exception (Data too large) with significant_terms aggregation

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last