Elasticsearch terms aggregation by strings in an array
I think all you're missing is "states.raw"
in your aggregation (note that, since no analyzer is specified, the "states"
field is analyzed with the standard analyzer; the sub-field "raw"
is "not_analyzed"
). Though your mapping might bear looking at as well. When I tried your mapping against ES 2.0 I got some errors, but this worked:
PUT /test_index{ "mappings": { "doc": { "properties": { "states": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } }}
Then I added a couple of docs:
POST /test_index/doc/_bulk{"index":{"_id":1}}{"states":["New York","New Jersey","California"]}{"index":{"_id":2}}{"states":["New York","North Carolina","North Dakota"]}
And this query seems to do what you want:
POST /test_index/_search{ "size": 0, "aggs" : { "states" : { "terms" : { "field" : "states.raw", "size": 10 } } }}
returning:
{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0, "hits": [] }, "aggregations": { "states": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "New York", "doc_count": 2 }, { "key": "California", "doc_count": 1 }, { "key": "New Jersey", "doc_count": 1 }, { "key": "North Carolina", "doc_count": 1 }, { "key": "North Dakota", "doc_count": 1 } ] } }}
Here's the code I used to test it:
http://sense.qbox.io/gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e