How do I compute facets/aggregations for the top n documents, with pagination in Elasticsearch?
As you probably saw in the documentation, the aggregations are performed on the scope of the query itself. If no query is given, the aggregations are performed on a match_all
list of results. Even if you would use size
at the query level, it will still not give you what you need because size
is just a way of returning a set of documents from all the documents the query matched. Aggregations operate on what the query matches.
This feature request is not new and has been asked for before some time ago.
In 1.7 there is no straight forward solution. Maybe you can use the limit filter or terminate_after in-body request parameter, but this will not return the documents that were, also, sorted. This will give you the first terminate_after
number of docs that matched the query and this number is per shard. This is not performed after the sorting has been applied.
In ES 2.0 there is, also, the sampler aggregation which works more or less the same way as the terminate_after
is working, but this one takes into consideration the score of the documents to be considered from each shard. In case you just sort after date_added
and the query is just a match_all
all the documents will have the same score and it will be returning an irrelevant set of documents.
In conclusion:
there is no good solution for this, there are workarounds with number of docs per shard. So, if you want 1000 cars, then you need to take this number divide it by the number of primary shards, use it in
sampler
aggregation or withterminate_after
and get a set of documentsmy suggestion is to use a query to limit the number of documents (cars) by a different criteria instead. For example, show (and aggregate) the cars in the last 30 days or something similar. Meaning, the criteria should be included in the query itself, so that the resulting set of documents to be the one you want it aggregated. Applying aggregations to a certain number of documents, after they have been sorted, is not easy.