Trying to set the max_gram and min_gram in Elasticsearch
I have faced a similar issue and below error message is clearly explaining the issue.
[400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: 1 but was [49]. This limit can be set by changing the [index.max_ngram_diff] index level setting."}],"type":"illegal_argument_exception","reason":"The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: 1 but was [49]. This limit can be set by changing the [index.max_ngram_diff] index level setting."},"status":400}
Basically, by Default, the difference between max_gram and min_gram in NGram Tokenizer can't be more than 1 and if you want you to change this, then in your index settings you need to change it by adding below setting.
"max_ngram_diff" : "50" --> you can mention this number accoding to your requirement.
Below is my index settings, where you can see I have a difference of 47
in my max_gram
and min_gram
hence set max_ngram_diff
to 50
.
{ "settings": { "index": { "analysis": { "analyzer": { "prefix": { "type": "custom", "filter": [ "lowercaseFilter" ], "tokenizer": "edgeNGramTokenizer" } }, "tokenizer": { "edgeNGramTokenizer": { "token_chars": [ "letter", "digit" ], "min_gram": "1", "type": "edgeNGram", "max_gram": "40" }, "loginNGram": { "type": "nGram", "min_gram": "3", "max_gram": "50" } } }, "number_of_shards": "1", "number_of_replicas": "0", "max_ngram_diff" : "50" } }}
Edit: Adding an official Elastic documentation, which explains that default length of max_gram is 2 and min_gram is 1, hence default difference between these can't be more than 1, hence the exception. And then snippet from the same doc
The index level setting index.max_ngram_diff controls the maximum allowed difference between max_gram and min_gram.
One can also use an index template to apply the setting automatically to all new indices:
curl -X PUT "localhost:9200/_index_template/template_1?pretty" -H 'Content-Type: application/json' -d'{ "index_patterns": [ "*" ], "template": { "settings": { "index": { "max_ngram_diff": 50 } } }}'
The template will not be deleted by removing every index, but has to be removed manually:
curl -X DELETE "localhost:9200/_index_template/template_1