How to highlight nested fields in Elasticsearch How to highlight nested fields in Elasticsearch elasticsearch elasticsearch

How to highlight nested fields in Elasticsearch


There are a number of things you can do here, with a parent/child relationship. I'll go over a few, and hopefully that will lead you in the right direction; it will still take lots of testing to figure out whether this solution is going to be more performant for you. Also, I left out a few of the details of your setup, for clarity. Please forgive the long post.

I set up a parent/child mapping as follows:

DELETE /test_indexPUT /test_index{   "settings": {      "number_of_shards": 1,      "number_of_replicas": 0   },   "mappings": {      "parent_doc": {         "properties": {            "identifier": {               "type": "string"            },            "description": {               "type": "string"            }         }      },      "child_doc": {         "_parent": {            "type": "parent_doc"         },         "properties": {            "content": {               "type": "string"            }         }      }   }}

Then added some test docs:

POST /test_index/_bulk{"index":{"_index":"test_index","_type":"parent_doc","_id":1}}{"identifier": "first", "description":"some special text"}{"index":{"_index":"test_index","_type":"child_doc","_parent":1}}{"content":"text that is special"}{"index":{"_index":"test_index","_type":"child_doc","_parent":1}}{"content":"text that is not"}{"index":{"_index":"test_index","_type":"parent_doc","_id":2}}{"identifier": "second", "description":"some different text"}{"index":{"_index":"test_index","_type":"child_doc","_parent":2}}{"content":"different child text, but special"}{"index":{"_index":"test_index","_type":"parent_doc","_id":3}}{"identifier": "third", "description":"we don't want this parent"}{"index":{"_index":"test_index","_type":"child_doc","_parent":3}}{"content":"or this child"}

If I'm understanding your specs correctly, we would want a query for "special" to return every one of these documents except the last two (correct me if I'm wrong). We want docs that match the text, have a child that matches the text, or have a parent that matches the text.

We can get back parents that match the query like this:

POST /test_index/parent_doc/_search{    "query": {        "match": {           "description": "special"        }    },    "highlight": {        "fields": {            "description": {},            "identifier": {}        }    }}...{   "took": 1,   "timed_out": false,   "_shards": {      "total": 1,      "successful": 1,      "failed": 0   },   "hits": {      "total": 1,      "max_score": 1.1263815,      "hits": [         {            "_index": "test_index",            "_type": "parent_doc",            "_id": "1",            "_score": 1.1263815,            "_source": {               "identifier": "first",               "description": "some special text"            },            "highlight": {               "description": [                  "some <em>special</em> text"               ]            }         }      ]   }}

And we can get back children that match the query like this:

POST /test_index/child_doc/_search{    "query": {        "match": {           "content": "special"        }    },    "highlight": {        "fields": {            "content": {}        }    }}...{   "took": 1,   "timed_out": false,   "_shards": {      "total": 1,      "successful": 1,      "failed": 0   },   "hits": {      "total": 2,      "max_score": 0.92364895,      "hits": [         {            "_index": "test_index",            "_type": "child_doc",            "_id": "geUFenxITZSL7epvB568uA",            "_score": 0.92364895,            "_source": {               "content": "text that is special"            },            "highlight": {               "content": [                  "text that is <em>special</em>"               ]            }         },         {            "_index": "test_index",            "_type": "child_doc",            "_id": "IMHXhM3VRsCLGkshx52uAQ",            "_score": 0.80819285,            "_source": {               "content": "different child text, but special"            },            "highlight": {               "content": [                  "different child text, but <em>special</em>"               ]            }         }      ]   }}

We can get back parents that match the text and children that match the text like this:

POST /test_index/parent_doc,child_doc/_search{    "query": {        "multi_match": {           "query": "special",           "fields": ["description", "content"]        }    },    "highlight": {        "fields": {            "description": {},            "identifier": {},            "content": {}        }    }}...{   "took": 3,   "timed_out": false,   "_shards": {      "total": 1,      "successful": 1,      "failed": 0   },   "hits": {      "total": 3,      "max_score": 1.1263815,      "hits": [         {            "_index": "test_index",            "_type": "parent_doc",            "_id": "1",            "_score": 1.1263815,            "_source": {               "identifier": "first",               "description": "some special text"            },            "highlight": {               "description": [                  "some <em>special</em> text"               ]            }         },         {            "_index": "test_index",            "_type": "child_doc",            "_id": "geUFenxITZSL7epvB568uA",            "_score": 0.75740534,            "_source": {               "content": "text that is special"            },            "highlight": {               "content": [                  "text that is <em>special</em>"               ]            }         },         {            "_index": "test_index",            "_type": "child_doc",            "_id": "IMHXhM3VRsCLGkshx52uAQ",            "_score": 0.6627297,            "_source": {               "content": "different child text, but special"            },            "highlight": {               "content": [                  "different child text, but <em>special</em>"               ]            }         }      ]   }}

However, to get back all the docs related to this query, we need to use a bool query:

POST /test_index/parent_doc,child_doc/_search{   "query": {      "bool": {         "should": [            {               "multi_match": {                  "query": "special",                  "fields": [                     "description",                     "content"                  ]               }            },            {               "has_child": {                  "type": "child_doc",                  "query": {                     "match": {                        "content": "special"                     }                  }               }            },            {               "has_parent": {                  "type": "parent_doc",                  "query": {                     "match": {                        "description": "special"                     }                  }               }            }         ]      }   },    "highlight": {        "fields": {            "description": {},            "identifier": {},            "content": {}        }    },    "fields": ["_parent", "_source"]}...{   "took": 5,   "timed_out": false,   "_shards": {      "total": 1,      "successful": 1,      "failed": 0   },   "hits": {      "total": 5,      "max_score": 0.8866254,      "hits": [         {            "_index": "test_index",            "_type": "parent_doc",            "_id": "1",            "_score": 0.8866254,            "_source": {               "identifier": "first",               "description": "some special text"            },            "highlight": {               "description": [                  "some <em>special</em> text"               ]            }         },         {            "_index": "test_index",            "_type": "child_doc",            "_id": "geUFenxITZSL7epvB568uA",            "_score": 0.67829096,            "_source": {               "content": "text that is special"            },            "fields": {               "_parent": "1"            },            "highlight": {               "content": [                  "text that is <em>special</em>"               ]            }         },         {            "_index": "test_index",            "_type": "child_doc",            "_id": "IMHXhM3VRsCLGkshx52uAQ",            "_score": 0.18709806,            "_source": {               "content": "different child text, but special"            },            "fields": {               "_parent": "2"            },            "highlight": {               "content": [                  "different child text, but <em>special</em>"               ]            }         },         {            "_index": "test_index",            "_type": "child_doc",            "_id": "NiwsP2VEQBKjqu1M4AdjCg",            "_score": 0.12531912,            "_source": {               "content": "text that is not"            },            "fields": {               "_parent": "1"            }         },         {            "_index": "test_index",            "_type": "parent_doc",            "_id": "2",            "_score": 0.12531912,            "_source": {               "identifier": "second",               "description": "some different text"            }         }      ]   }}

(I included the "_parent" field to make it easier to see why docs were included in the results, as shown here).

Let me know if this helps.

Here is the code I used:

http://sense.qbox.io/gist/d69a4d6531dc063faa4b4e094cff2a472a73c5a6