Elasticsearch: Scoring with Ngrams
You can solve that using an edgeNGram
tokenizer instead of an edgeNGram
filter:
settings: { analysis: { tokenizer: { ngram_tokenizer: { type: 'edge_ngram', min_gram: 2, max_gram: 15 } }, analyzer: { ngram_analyzer: { type: 'custom', tokenizer: 'ngram_tokenizer', filter: [ 'lowercase' ] } } } }
The reason for this is that the edgeNGram
filter will write the terms for a given token at the same position (pretty much like synonyms would do), while the edgeNGram
tokenizer will create tokens which have different positions, hence influencing the length normalization, hence the score.
Note that this works only on pre-2.0 ES releases, because a compound score is computed from all ngram tokens scores, whereas in ES 2.x only the matching token is scored.