Querying in elasticsearch Querying in elasticsearch elasticsearch elasticsearch

Querying in elasticsearch


Before you can go further, there are two steps that needs to be done:

  • jarFileId and dependedntClass fields should be mapped as a keyword type (if this is a problem you can use multi-field field of keyword type, and use them in query)
  • dependencies should be nested object

Looking at your data, the joining element between these two types of documents is jarFileId field. If your existing query gave you in result e.g. this list of jars:

{[{"jarFileId": "JAR-0001"},{"jarFileId": "JAR-0002"}]}

having this information, you can use this query:

{   "size":0,   "query":{      "constant_score":{         "filter":{            "terms":{ "jarFileId":["JAR-0001","JAR-0002"] }         }      }   },   "aggs":{      "filtered":{         "filter":{            "constant_score":{               "filter":{                   "terms":{ "jarFileId":["JAR-0001","JAR-0002"] }               }            }         },         "aggs":{            "dependent":{               "nested":{                  "path":"dependencies"               },               "aggs":{                  "classes":{                     "terms":{                        "field":"dependencies.dependedntClass"                     }                  }               }            }         }      }   }}

And as a result you'll get:

{    ...,    "aggregations": {        "filtered": {            "doc_count": 1,            "dependent": {                "doc_count": 3,                "classes": {                    "doc_count_error_upper_bound": 0,                    "sum_other_doc_count": 0,                    "buckets": [                        {                            "key": "core/internal/TrackingEventQueue$TrackingException",                            "doc_count": 1                        },                        {                            "key": "java/awt/EventQueue",                            "doc_count": 1                        },                        {                            "key": "java/lang/RuntimeException",                            "doc_count": 1                        }                    ]                }            }        }    }}

With your current model, it is not possible to do it with one query - elsticsearch does not have a join mechanism. A single document should have all the necessary information so that elasticsearch is able to decide if it matches the query or not. This is nicely described here. So either you go with application-side joins (similar example to yours under the link) or denormalize your data if the performance of search is the core issue here. The only built-in "join mechanism" that I'm aware of is Term Filter Lookup but it allows to operate only on id field.