Word-oriented completion suggester (ElasticSearch 5.x)

elasticsearch autocomplete duplicates elasticsearch-5

As hinted at in the comment, another way of achieving this without getting the duplicate documents is to create a sub-field for the firstname field containing ngrams of the field. First you define your mapping like this:

PUT my-index{  "settings": {    "analysis": {      "analyzer": {        "completion_analyzer": {          "type": "custom",          "filter": [            "lowercase",            "completion_filter"          ],          "tokenizer": "keyword"        }      },      "filter": {        "completion_filter": {          "type": "edge_ngram",          "min_gram": 1,          "max_gram": 24        }      }    }  },  "mappings": {    "users": {      "properties": {        "autocomplete": {          "type": "text",          "fields": {            "raw": {              "type": "keyword"            },            "completion": {              "type": "text",              "analyzer": "completion_analyzer",              "search_analyzer": "standard"            }          }        },        "firstName": {          "type": "text"        },        "lastName": {          "type": "text"        }      }    }  }}

Then you index a few documents:

POST my-index/users/_bulk{"index":{}}{ "firstName": "John", "lastName": "Doe", "autocomplete": "John Doe"}{"index":{}}{ "firstName": "John", "lastName": "Deere", "autocomplete": "John Deere" }{"index":{}}{ "firstName": "Johnny", "lastName": "Cash", "autocomplete": "Johnny Cash" }

Then you can query for joh and get one result for John and another one for Johnny

{  "size": 0,  "query": {    "term": {      "autocomplete.completion": "john d"    }  },  "aggs": {    "suggestions": {      "terms": {        "field": "autocomplete.raw"      }    }  }}

Results:

{  "aggregations": {    "suggestions": {      "doc_count_error_upper_bound": 0,      "sum_other_doc_count": 0,      "buckets": [        {          "key": "John Doe",          "doc_count": 1        },        {          "key": "John Deere",          "doc_count": 1        }      ]    }  }}

UPDATE (June 25th, 2019):

ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html

elasticsearch autocomplete duplicates elasticsearch-5

An additional field skip_duplicates will be added in the next release 6.x.

From the docs at https://www.elastic.co/guide/en/elasticsearch/reference/master/search-suggesters-completion.html#skip_duplicates:

POST music/_search?pretty{    "suggest": {        "song-suggest" : {            "prefix" : "nor",            "completion" : {                "field" : "suggest",                "skip_duplicates": true            }        }    }}

elasticsearch autocomplete duplicates elasticsearch-5

We face exactly the same problem. In Elasticsearch 2.4 the approach like you describe used to work fine for us but now as you say the suggester has become document-based while like you we are only interested in unique words, not in the documents.

The only 'solution' we could think of so far is to create a separate index just for the words on which we want to perform the suggestion queries and in this separate index make sure somehow that identical words are only indexed once. Then you could perform the suggestion queries on this separate index. This is far from ideal, if only because we will then need to make sure that this index remains in sync with the other index that we need for our other queries.

CodeHunter

Word-oriented completion suggester (ElasticSearch 5.x)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last