ElasticSearch, return results, dedupped and with custom sort bubbling to top
Try this approach:
- your mapping should, also, store a
not_analyzed
version of yourtitle
, so that the buckets will be built based on the full title, not on individual terms forming the title:
{ "mappings": { "engineers": { "properties": { "title": { "type": "string", "fields":{ "raw": { "type": "string", "index": "not_analyzed" } } }, "content": { "type": "string" }, "weighted_importance": { "type": "integer" } } } }}
- group the results on buckets built on
title.raw
defined above - define a
top_hits
sub-aggregation to bring back the "best" document for each bucket - define another sub-aggregation on the same level as the
top_hits
one that should be amax
aggregation that will compute the maximumweighted_importance
- in the main aggregation use the
max
above to sort the resulting buckets
GET /my_index/engineers/_search?search_type=count{ "query": { "match": { "title": "Engineer" } }, "aggs": { "title": { "terms": { "field": "title.raw", "order": {"best_hit":"desc"} }, "aggs": { "first_match": { "top_hits": { "sort": [{"weighted_importance": {"order": "desc"}}], "size": 1 } }, "best_hit": { "max": { "lang": "groovy", "script": "doc['weighted_importance'].value" } } } } }}