Elasticsearch distinct filter values
This is a job for a terms
aggregation (documentation).
You can have the distinct departments
values like this :
POST company/employee/_search{ "size":0, "aggs": { "by_departments": { "terms": { "field": "departments.name", "size": 0 //see note 1 } } }}
Which, in your example, outputs :
{ ... "aggregations": { "by_departments": { "buckets": [ { "key": "management", //see note 2 "doc_count": 2 }, { "key": "accounts", "doc_count": 1 }, { "key": "it", "doc_count": 1 } ] } }}
Two additional notes :
- setting
size
to 0 will set the maximum buckets number to Integer.MAX_VALUE. Don't use it if there are too manydepartments
distinct values. - you can see that the keys are
terms
resulting of analyzingdepartments
values. Be sure to use yourterms
aggregation on a field mapped asnot_analyzed
.
For example, with our default mapping (departments.name
is an analyzed
string), adding this employee:
{ "name": "Bill Gates", "departments": [ { "name": "IT" }, { "name": "Human Resource" } ]}
will cause this kind of result:
{ ... "aggregations": { "by_departments": { "buckets": [ { "key": "it", "doc_count": 2 }, { "key": "management", "doc_count": 2 }, { "key": "accounts", "doc_count": 1 }, { "key": "human", "doc_count": 1 }, { "key": "resource", "doc_count": 1 } ] } }}
With a correct mapping :
POST company{ "mappings": { "employee": { "properties": { "name": { "type": "string" }, "departments": { "type": "object", "properties": { "name": { "type": "string", "index": "not_analyzed" } } } } } }}
The same request ends up outputting :
{ ... "aggregations": { "by_departments": { "buckets": [ { "key": "IT", "doc_count": 2 }, { "key": "Management", "doc_count": 2 }, { "key": "Accounts", "doc_count": 1 }, { "key": "Human Resource", "doc_count": 1 } ] } }}
Hope this helps!