How to improvise on heavy should queries in huge data set? How to improvise on heavy should queries in huge data set? elasticsearch elasticsearch

How to improvise on heavy should queries in huge data set?


Given the fact that you are retrieving the IDs of the documents, I can assuming that you are not executing a query and rather a scan and retrieving all the documents which satisfy your query.

Now, the first query is an intersection as compared to the second which is a union.Given the fact that these words appear in 5874, 270419 and 397829 docs, the intersection is of length 5874 at max whereas the union is of length 397829. These are the number of documents that your ES cluster will be returning for the two cases.

The drastic difference for the time taken between the two cases is because of the number of documents that are to be returned. For scanning, you must be performing pagination (via scroll) and repeating in a loop. And that will take time if the number of document increases.

If you just execute a query with some size limit instead of scanning, then it is likely to get finished for nearly the same time for both the cases.