Querying in elasticsearch
Before you can go further, there are two steps that needs to be done:
jarFileId
anddependedntClass
fields should be mapped as akeyword
type (if this is a problem you can use multi-field field ofkeyword
type, and use them in query)dependencies
should be nested object
Looking at your data, the joining element between these two types of documents is jarFileId
field. If your existing query gave you in result e.g. this list of jars:
{[{"jarFileId": "JAR-0001"},{"jarFileId": "JAR-0002"}]}
having this information, you can use this query:
{ "size":0, "query":{ "constant_score":{ "filter":{ "terms":{ "jarFileId":["JAR-0001","JAR-0002"] } } } }, "aggs":{ "filtered":{ "filter":{ "constant_score":{ "filter":{ "terms":{ "jarFileId":["JAR-0001","JAR-0002"] } } } }, "aggs":{ "dependent":{ "nested":{ "path":"dependencies" }, "aggs":{ "classes":{ "terms":{ "field":"dependencies.dependedntClass" } } } } } } }}
And as a result you'll get:
{ ..., "aggregations": { "filtered": { "doc_count": 1, "dependent": { "doc_count": 3, "classes": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "core/internal/TrackingEventQueue$TrackingException", "doc_count": 1 }, { "key": "java/awt/EventQueue", "doc_count": 1 }, { "key": "java/lang/RuntimeException", "doc_count": 1 } ] } } } }}
With your current model, it is not possible to do it with one query - elsticsearch does not have a join mechanism. A single document should have all the necessary information so that elasticsearch is able to decide if it matches the query or not. This is nicely described here. So either you go with application-side joins (similar example to yours under the link) or denormalize your data if the performance of search is the core issue here. The only built-in "join mechanism" that I'm aware of is Term Filter Lookup but it allows to operate only on id
field.