What causes different search results for same elastic search query on two nodes What causes different search results for same elastic search query on two nodes elasticsearch elasticsearch

What causes different search results for same elastic search query on two nodes


The hits mismatch is, most probably, because of an un-sync between the primary shards and the replica. This can happen if you had a node leaving the cluster (for whatever reason) but kept making changes to documents (indexing, deleting, updating).

The scoring part is a different story, and can be explained by "Relevancy Scoring" section from this blog post:

Elasticsearch faces an interesting dilemma when you execute a search. Your query needs to find all the relevant documents...but these documents are scattered around any number of shards in your cluster. Each shard is basically a Lucene index, which maintains its own TF and DF statistics. A shard only knows how many times "pineapple" appears within the shard, not the entire cluster.

I would give it a try, when searching, to "DFS Query Then Fetch", meaning _search?search_type=dfs_query_then_fetch .... that should help with the accuracy of scoring.

Also the different document count caused by document changes during the node disconnect affects the score calculation after even after deleting and rebuilding the index. This might be because changes to documents happened differently on the replica and on the primary shards, more specifically documents have been deleted. A deleted document is permanently removed from the index at segments merging time. And segments merging doesn't happen unless certain conditions are met in the underlying Lucene instance.

A forced merging can be initiated by a POST to /_optimize?max_num_segments=1. Warning: This takes a really long time (depending on the size of the index) and will require significant IO resources and CPU and should not be run on an index where changes are being made. Documentation: Optimize, Segments Merging