ElasticSearch : Sorting by nested documents' values
First a correction of terminology: in Elasticsearch, "parent/child" refers to completely separate docs, where the child doc points to the parent doc. Parent and children are stored on the same shard, but they can be updated independently.
With your example above, what you are trying to achieve can be done with nested
docs.
Currently, your locations
field is of type:"object"
. This means that the values in each location get flattened to look something like this:
{ "locations.category": [5322606, 5883712, 5322605], "locations.subCategory": [6032961], "locations.order": [1, 3, 2]}
In other words, the "sub" fields get flattened into multi-value fields, which is of no use to you, because there is no correlation between category: 5322606
and order: 1
.
However, if you change locations
to be type:"nested"
then internally it will index each location as a separate doc, meaning that each location can be queried independently, using the dedicated nested
query and filter.
By default, the nested
query will return a _score
based upon how well each location matches, but in your case you want to return the highest value of the order
field from any matching children. To do this, you'll need to use a custom_score query.
So let's start by creating the index with the appropriate mapping:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '{ "mappings" : { "products" : { "properties" : { "locations" : { "type" : "nested", "properties" : { "order" : { "type" : "long" }, "subCategory" : { "type" : "long" }, "category" : { "type" : "long" } } }, "id" : { "type" : "long" } } } }}'
The we index your example doc:
curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1' -d '{ "locations" : [ { "order" : 1, "category" : 5322606 }, { "order" : 3, "subCategory" : null, "category" : 5883712 }, { "order" : 2, "subCategory" : 6032961, "category" : 5322605 } ], "id" : 5331880}'
And now we can search for it using the queries we discussed above:
curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1' -d '{ "query" : { "nested" : { "query" : { "custom_score" : { "script" : "doc[\u0027locations.order\u0027].value", "query" : { "constant_score" : { "filter" : { "and" : [ { "term" : { "category" : 5322605 } }, { "term" : { "subCategory" : 6032961 } } ] } } } } }, "score_mode" : "max", "path" : "locations" } }}'
Note: the single quotes within the script have been escaped as \u0027
to get around shell quoting. The script actually looks like this: "doc['locations.order'].value"
If you look at the _score
from the results, you can see that it has used the order
value from the matching location
:
{ "hits" : { "hits" : [ { "_source" : { "locations" : [ { "order" : 1, "category" : 5322606 }, { "order" : 3, "subCategory" : null, "category" : 5883712 }, { "order" : 2, "subCategory" : 6032961, "category" : 5322605 } ], "id" : 5331880 }, "_score" : 2, "_index" : "test", "_id" : "cXTFUHlGTKi0hKAgUJFcBw", "_type" : "products" } ], "max_score" : 2, "total" : 1 }, "timed_out" : false, "_shards" : { "failed" : 0, "successful" : 5, "total" : 5 }, "took" : 9}
Just add a more updated version related to sorting parent by child field.We can query parent doc type sorted by child field ('count' e.g.) similar as follows.