Simulating a join in ElasticSearch Simulating a join in ElasticSearch elasticsearch elasticsearch

Simulating a join in ElasticSearch


You can do it like this:

  • first you have a query that filters your documents with actions 1 and 2 only (I have no idea if you can have other action types)
  • then the magic is with aggregations
    • the first aggregation is a terms one for user_id, so that you can do individual calculations per user
    • then you use a cardinality sub-aggregation to count the number of distinct actions per user. Since the query is for actions 1 and 2 that number can only be 1 or 2
    • then you use a bucket_selector sub-aggregation to only keep those users that have the cardinality result of 2.
{  "size": 0,  "query": {    "bool": {      "should": [        {          "terms": {            "action_id": [              1,              2            ]          }        }      ]    }  },  "aggs": {    "users": {      "terms": {        "field": "user_id",        "size": 10      },      "aggs": {        "actions": {          "cardinality": {            "field": "action_id"          }        },        "actions_count_bucket_filter": {          "bucket_selector": {            "buckets_path": {              "totalActions": "actions"            },            "script": "totalActions >= 2"          }        }      }    }  }}

The result will look like this:

   "aggregations": {      "users": {         "doc_count_error_upper_bound": 0,         "sum_other_doc_count": 0,         "buckets": [            {               "key": 1,               "doc_count": 2,               "actions": {                  "value": 2               }            },            {               "key": 5,               "doc_count": 2,               "actions": {                  "value": 2               }            }         ]      }   }

The keys are the user_ids whose actions are 1 and 2. bucket_selector aggregation is available in 2.x+ version of ES.