Simple explanation of different ElasticSearch similarity algorithms

algorithm search lucene elasticsearch scoring

The problem you run into here, is by the description set forward in the linked answer, Lucene's default similarity, and bm25 are fundamentally identical, in that they both factor in:

more occurrences in the document are preferred
terms rarer in the corpus are preferred
shorter documents are more heavily weighted
other functions used to adjust score, boosts, etc.

dfr actually encompasses 7 different base-models alone, each using a different scoring algorithm, followed by two highly configurable normalization steps. A number of configuration options fit the very general steps above, some diverge from it.

Similarly, ib allows some significant configuration as well, but generally hits the same high points, of favoring higher term frequency, favoring matches on terms that are more rare (by some description), and adjusting for document length, boosts, and other possible normalizations.

CodeHunter

Simple explanation of different ElasticSearch similarity algorithms

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last