Very slow elasticsearch term aggregation. How to improve?

elasticsearch aggregate query-performance

thanks again for the effort.

Finally we have solved the main problem and our performance is back to normal.

To be short we have done the following: - updated the mapping for the default_group_field to be of type Long - compressed the default_group_field values so that it would match type Long

Some explanations:

Aggregations on string fields require some work work be done on them. As we see from logs building Global Ordinals for that field that has very wide variance was very expensive. In fact we do only aggregations on the field mentioned. With that said it is not very efficient to use String type.

So we have changed the mapping to:

default_group_field: {  type: 'long',  index: 'not_analyzed'}

This way we do not touch those expensive operations.

After this and the same query timing reduced to ~100ms. It also dropped down CPU usage.

PS 1

I`ve got a lot of info from docs on global ordinals

PS 2

Still I have no idea on how to bypass this issue with the field of type String. Please comment if you have some ideas.

elasticsearch aggregate query-performance

This is likely due to the the default behaviour of terms aggregations, which requires global ordinals to be built. This computation can be expensive for high-cardinality fields.

The following blog addresses the likely cause of this poor performance and several approaches to resolve it.

https://www.elastic.co/blog/improving-the-performance-of-high-cardinality-terms-aggregations-in-elasticsearch

elasticsearch aggregate query-performance

Ok. I will try to answer this,There are few parts in the question which I was not able to understand like -

To avoid sub-aggregations we have united target fields values into one called default_group_field by joining them with dot(.)

I am not sure what you really mean by this because you said that,

You added this field to avoid aggregation(But how? and also how are you avoiding the aggregation if you are joining them with dot(.)?)

Ok. Even I am also new to elastic search. So If there is anything I missed, you can comment on this answer. Thanks,

I will continue to answer this question.

But before that I am assuming that you have that(default_group_field) field to differentiate between records duration, start_date, adults, kids.

I will try to provide one example below after my solution.

My solution:

{  "size": 0,  "aggs": {    "offers": {      "terms": {        "field": "default_group_field"      },      "aggs": {        "sort_cost_asc": {          "top_hits": {            "sort": [              {                "cost": {                  "order": "asc"                }              }            ],            "_source": {              "include": [ ... fields you want from the document ... ]            },            "size": 1          }        }      }    }  },  "query": {"... your query part ..."   }}

I will try to explain what I am trying to do here:

I am assuming that your document looks like this (may be there is some nesting also, But for example I am trying to keep the document as simple as I can):

document1:

{"default_group_field": "kids","cost": 100,"documentId":1}

document2:

{"default_group_field": "kids","cost": 120,"documentId":2}

document3:

{"default_group_field": "adults","cost": 50,"documentId":3}

document4:

{"default_group_field": "adults","cost": 150,"documentId":4}

So now you have this documents and you want to get the min. cost document for both adults and kids:

so your query should look like this:

    {      "size": 0,      "aggs": {        "offers": {          "terms": {            "field": "default_group_field"          },          "aggs": {            "sort_cost_asc": {              "top_hits": {                "sort": [                  {                    "cost": {                      "order": "asc"                    }                  }                ],                "_source": {                  "include": ["documentId", "cost", "default_group_field"]                },                "size": 1              }            }          }        }      },      "query": {         "filtered":{ "query": { "match_all": {} } }          }    }

To explain the above query, what I am doing is grouping the document by "default_group_field" and then I am sorting each group by cost and size:1 helps me to get the just one document.

Therefore the result for this query will be min. cost document in each category (adults and kids)

Usually when I try to write the query for elastic search or db. I try to minimize the number of document or rows.

I assume that I am right in understanding your question.If I am wrong in understanding your question or I did some mistake, Please reply and let me know where I went wrong.

Thanks,

CodeHunter

Very slow elasticsearch term aggregation. How to improve?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last