How does date rounding in elasticsearch work, and how does it affect cache churn? How does date rounding in elasticsearch work, and how does it affect cache churn? elasticsearch elasticsearch

How does date rounding in elasticsearch work, and how does it affect cache churn?


As I understand it, because dates are in milliseconds we must round them to make them more general and make it more probable that filter results can be reused in some other query. I don't have idea what time it will round to. But it should not matter. It's only important that it's rounding to the same thing so it can be reused by cache.

As the order in which we apply filter matters the sooner we narrow records down the better. Ideally our first filter is a cached one and filters out as much as possible. That's why if we want to get data from the last hour, filtering out everything but today makes sense.

Let's consider first condition you mentioned:

record_datetime >= now/d && record_datetime >= now-1h

It may seem that the first condition is redundant and could be removed without any side effects. But elastic search benefits from it because it can reuse cached filter data it has stored and execute second filter on much smaller set. Keep in mind that be reversing the order of filters we would lose all the benefits of this redundancy.

As you mentioned this can be used also when looking deeper into past. You could use a filter that throws out everything after some day. For example if we need data from the first week of this year we could do something along the lines:

record_datetime >= 01.01.2014 && record_datetime <= 05.01.2014 && other_filters

Other filters don't have to be time related. If this will be executed many times only other_filters will be fully executed, rest will use cached bitsets.

What's more this approach can be used to any numerical data. For example, before filtering by exact latitude and longitude first filter by some coarse grid or city. We want to make filters as similar as possible between queries.

Not sure If I'm being clear enough :)There's a nice article about improving performance in ES with filters and the exact technique you're asking is explained here. There's also a ES official documentation about filter order and caching here.