Elasticsearch distinct filter values Elasticsearch distinct filter values elasticsearch elasticsearch

Elasticsearch distinct filter values


This is a job for a terms aggregation (documentation).

You can have the distinct departments values like this :

POST company/employee/_search{  "size":0,  "aggs": {    "by_departments": {      "terms": {        "field": "departments.name",        "size": 0 //see note 1      }    }  }}

Which, in your example, outputs :

{   ...   "aggregations": {      "by_departments": {         "buckets": [            {               "key": "management", //see note 2               "doc_count": 2            },            {               "key": "accounts",               "doc_count": 1            },            {               "key": "it",               "doc_count": 1            }         ]      }   }}

Two additional notes :

  • setting size to 0 will set the maximum buckets number to Integer.MAX_VALUE. Don't use it if there are too many departments distinct values.
  • you can see that the keys are terms resulting of analyzing departments values. Be sure to use your terms aggregation on a field mapped as not_analyzed .

For example, with our default mapping (departments.name is an analyzed string), adding this employee:

{  "name": "Bill Gates",  "departments": [    {      "name": "IT"    },    {      "name": "Human Resource"    }  ]}

will cause this kind of result:

{   ...   "aggregations": {      "by_departments": {         "buckets": [            {               "key": "it",               "doc_count": 2            },            {               "key": "management",               "doc_count": 2            },            {               "key": "accounts",               "doc_count": 1            },            {               "key": "human",               "doc_count": 1            },            {               "key": "resource",               "doc_count": 1            }         ]      }   }}

With a correct mapping :

POST company{  "mappings": {    "employee": {      "properties": {        "name": {          "type": "string"        },        "departments": {          "type": "object",          "properties": {            "name": {              "type": "string",              "index": "not_analyzed"            }          }        }      }    }  }}

The same request ends up outputting :

{   ...   "aggregations": {      "by_departments": {         "buckets": [            {               "key": "IT",               "doc_count": 2            },            {               "key": "Management",               "doc_count": 2            },            {               "key": "Accounts",               "doc_count": 1            },            {               "key": "Human Resource",               "doc_count": 1            }         ]      }   }}

Hope this helps!