Word-oriented completion suggester (ElasticSearch 5.x) Word-oriented completion suggester (ElasticSearch 5.x) elasticsearch elasticsearch

Word-oriented completion suggester (ElasticSearch 5.x)


As hinted at in the comment, another way of achieving this without getting the duplicate documents is to create a sub-field for the firstname field containing ngrams of the field. First you define your mapping like this:

PUT my-index{  "settings": {    "analysis": {      "analyzer": {        "completion_analyzer": {          "type": "custom",          "filter": [            "lowercase",            "completion_filter"          ],          "tokenizer": "keyword"        }      },      "filter": {        "completion_filter": {          "type": "edge_ngram",          "min_gram": 1,          "max_gram": 24        }      }    }  },  "mappings": {    "users": {      "properties": {        "autocomplete": {          "type": "text",          "fields": {            "raw": {              "type": "keyword"            },            "completion": {              "type": "text",              "analyzer": "completion_analyzer",              "search_analyzer": "standard"            }          }        },        "firstName": {          "type": "text"        },        "lastName": {          "type": "text"        }      }    }  }}

Then you index a few documents:

POST my-index/users/_bulk{"index":{}}{ "firstName": "John", "lastName": "Doe", "autocomplete": "John Doe"}{"index":{}}{ "firstName": "John", "lastName": "Deere", "autocomplete": "John Deere" }{"index":{}}{ "firstName": "Johnny", "lastName": "Cash", "autocomplete": "Johnny Cash" }

Then you can query for joh and get one result for John and another one for Johnny

{  "size": 0,  "query": {    "term": {      "autocomplete.completion": "john d"    }  },  "aggs": {    "suggestions": {      "terms": {        "field": "autocomplete.raw"      }    }  }}

Results:

{  "aggregations": {    "suggestions": {      "doc_count_error_upper_bound": 0,      "sum_other_doc_count": 0,      "buckets": [        {          "key": "John Doe",          "doc_count": 1        },        {          "key": "John Deere",          "doc_count": 1        }      ]    }  }}

UPDATE (June 25th, 2019):

ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html


An additional field skip_duplicates will be added in the next release 6.x.

From the docs at https://www.elastic.co/guide/en/elasticsearch/reference/master/search-suggesters-completion.html#skip_duplicates:

POST music/_search?pretty{    "suggest": {        "song-suggest" : {            "prefix" : "nor",            "completion" : {                "field" : "suggest",                "skip_duplicates": true            }        }    }}


We face exactly the same problem. In Elasticsearch 2.4 the approach like you describe used to work fine for us but now as you say the suggester has become document-based while like you we are only interested in unique words, not in the documents.

The only 'solution' we could think of so far is to create a separate index just for the words on which we want to perform the suggestion queries and in this separate index make sure somehow that identical words are only indexed once. Then you could perform the suggestion queries on this separate index. This is far from ideal, if only because we will then need to make sure that this index remains in sync with the other index that we need for our other queries.