How to highlight nested fields in Elasticsearch
There are a number of things you can do here, with a parent/child relationship. I'll go over a few, and hopefully that will lead you in the right direction; it will still take lots of testing to figure out whether this solution is going to be more performant for you. Also, I left out a few of the details of your setup, for clarity. Please forgive the long post.
I set up a parent/child mapping as follows:
DELETE /test_indexPUT /test_index{ "settings": { "number_of_shards": 1, "number_of_replicas": 0 }, "mappings": { "parent_doc": { "properties": { "identifier": { "type": "string" }, "description": { "type": "string" } } }, "child_doc": { "_parent": { "type": "parent_doc" }, "properties": { "content": { "type": "string" } } } }}
Then added some test docs:
POST /test_index/_bulk{"index":{"_index":"test_index","_type":"parent_doc","_id":1}}{"identifier": "first", "description":"some special text"}{"index":{"_index":"test_index","_type":"child_doc","_parent":1}}{"content":"text that is special"}{"index":{"_index":"test_index","_type":"child_doc","_parent":1}}{"content":"text that is not"}{"index":{"_index":"test_index","_type":"parent_doc","_id":2}}{"identifier": "second", "description":"some different text"}{"index":{"_index":"test_index","_type":"child_doc","_parent":2}}{"content":"different child text, but special"}{"index":{"_index":"test_index","_type":"parent_doc","_id":3}}{"identifier": "third", "description":"we don't want this parent"}{"index":{"_index":"test_index","_type":"child_doc","_parent":3}}{"content":"or this child"}
If I'm understanding your specs correctly, we would want a query for "special"
to return every one of these documents except the last two (correct me if I'm wrong). We want docs that match the text, have a child that matches the text, or have a parent that matches the text.
We can get back parents that match the query like this:
POST /test_index/parent_doc/_search{ "query": { "match": { "description": "special" } }, "highlight": { "fields": { "description": {}, "identifier": {} } }}...{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 1, "max_score": 1.1263815, "hits": [ { "_index": "test_index", "_type": "parent_doc", "_id": "1", "_score": 1.1263815, "_source": { "identifier": "first", "description": "some special text" }, "highlight": { "description": [ "some <em>special</em> text" ] } } ] }}
And we can get back children that match the query like this:
POST /test_index/child_doc/_search{ "query": { "match": { "content": "special" } }, "highlight": { "fields": { "content": {} } }}...{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0.92364895, "hits": [ { "_index": "test_index", "_type": "child_doc", "_id": "geUFenxITZSL7epvB568uA", "_score": 0.92364895, "_source": { "content": "text that is special" }, "highlight": { "content": [ "text that is <em>special</em>" ] } }, { "_index": "test_index", "_type": "child_doc", "_id": "IMHXhM3VRsCLGkshx52uAQ", "_score": 0.80819285, "_source": { "content": "different child text, but special" }, "highlight": { "content": [ "different child text, but <em>special</em>" ] } } ] }}
We can get back parents that match the text and children that match the text like this:
POST /test_index/parent_doc,child_doc/_search{ "query": { "multi_match": { "query": "special", "fields": ["description", "content"] } }, "highlight": { "fields": { "description": {}, "identifier": {}, "content": {} } }}...{ "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 3, "max_score": 1.1263815, "hits": [ { "_index": "test_index", "_type": "parent_doc", "_id": "1", "_score": 1.1263815, "_source": { "identifier": "first", "description": "some special text" }, "highlight": { "description": [ "some <em>special</em> text" ] } }, { "_index": "test_index", "_type": "child_doc", "_id": "geUFenxITZSL7epvB568uA", "_score": 0.75740534, "_source": { "content": "text that is special" }, "highlight": { "content": [ "text that is <em>special</em>" ] } }, { "_index": "test_index", "_type": "child_doc", "_id": "IMHXhM3VRsCLGkshx52uAQ", "_score": 0.6627297, "_source": { "content": "different child text, but special" }, "highlight": { "content": [ "different child text, but <em>special</em>" ] } } ] }}
However, to get back all the docs related to this query, we need to use a bool
query:
POST /test_index/parent_doc,child_doc/_search{ "query": { "bool": { "should": [ { "multi_match": { "query": "special", "fields": [ "description", "content" ] } }, { "has_child": { "type": "child_doc", "query": { "match": { "content": "special" } } } }, { "has_parent": { "type": "parent_doc", "query": { "match": { "description": "special" } } } } ] } }, "highlight": { "fields": { "description": {}, "identifier": {}, "content": {} } }, "fields": ["_parent", "_source"]}...{ "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 5, "max_score": 0.8866254, "hits": [ { "_index": "test_index", "_type": "parent_doc", "_id": "1", "_score": 0.8866254, "_source": { "identifier": "first", "description": "some special text" }, "highlight": { "description": [ "some <em>special</em> text" ] } }, { "_index": "test_index", "_type": "child_doc", "_id": "geUFenxITZSL7epvB568uA", "_score": 0.67829096, "_source": { "content": "text that is special" }, "fields": { "_parent": "1" }, "highlight": { "content": [ "text that is <em>special</em>" ] } }, { "_index": "test_index", "_type": "child_doc", "_id": "IMHXhM3VRsCLGkshx52uAQ", "_score": 0.18709806, "_source": { "content": "different child text, but special" }, "fields": { "_parent": "2" }, "highlight": { "content": [ "different child text, but <em>special</em>" ] } }, { "_index": "test_index", "_type": "child_doc", "_id": "NiwsP2VEQBKjqu1M4AdjCg", "_score": 0.12531912, "_source": { "content": "text that is not" }, "fields": { "_parent": "1" } }, { "_index": "test_index", "_type": "parent_doc", "_id": "2", "_score": 0.12531912, "_source": { "identifier": "second", "description": "some different text" } } ] }}
(I included the "_parent"
field to make it easier to see why docs were included in the results, as shown here).
Let me know if this helps.
Here is the code I used:
http://sense.qbox.io/gist/d69a4d6531dc063faa4b4e094cff2a472a73c5a6