What causes different search results for same elastic search query on two nodes

jakarta-ee elasticsearch

The hits mismatch is, most probably, because of an un-sync between the primary shards and the replica. This can happen if you had a node leaving the cluster (for whatever reason) but kept making changes to documents (indexing, deleting, updating).

The scoring part is a different story, and can be explained by "Relevancy Scoring" section from this blog post:

Elasticsearch faces an interesting dilemma when you execute a search. Your query needs to find all the relevant documents...but these documents are scattered around any number of shards in your cluster. Each shard is basically a Lucene index, which maintains its own TF and DF statistics. A shard only knows how many times "pineapple" appears within the shard, not the entire cluster.

I would give it a try, when searching, to "DFS Query Then Fetch", meaning _search?search_type=dfs_query_then_fetch .... that should help with the accuracy of scoring.

Also the different document count caused by document changes during the node disconnect affects the score calculation after even after deleting and rebuilding the index. This might be because changes to documents happened differently on the replica and on the primary shards, more specifically documents have been deleted. A deleted document is permanently removed from the index at segments merging time. And segments merging doesn't happen unless certain conditions are met in the underlying Lucene instance.

A forced merging can be initiated by a POST to /_optimize?max_num_segments=1. Warning: This takes a really long time (depending on the size of the index) and will require significant IO resources and CPU and should not be run on an index where changes are being made. Documentation: Optimize, Segments Merging

CodeHunter

What causes different search results for same elastic search query on two nodes

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last