ElasticSearch Join Filter: Using subquery results as filter input possible? ElasticSearch Join Filter: Using subquery results as filter input possible? elasticsearch elasticsearch

ElasticSearch Join Filter: Using subquery results as filter input possible?


Here's a link to a runnable example:

http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c

It presumes documents that look like:

{ "transaction_type" : "some_transaction", "user_base" : "some_user_base_id" }

The query is set to return no results, since aggregations take care of computing the stats you're looking for:

{  "size" : 0,  "query" : {    "match_all" : {}  },  "aggs" : {    "distinct_transactions" : {      "terms" : {        "field" : "transaction_type",        "size" : 20      },      "aggs" : {        "by_user_base" : {          "terms" : {            "field" : "user_base",            "size" : 20          }        }      }    }  }}

And here's what the result looks like:

  "aggregations": {      "distinct_transactions": {         "buckets": [            {               "key": "subscribe",               "doc_count": 4,               "by_user_base": {                  "buckets": [                     {                        "key": "2",                        "doc_count": 3                     },                     {                        "key": "1",                        "doc_count": 1                     }                  ]               }            },            {               "key": "purchase",               "doc_count": 3,               "by_user_base": {                  "buckets": [                     {                        "key": "1",                        "doc_count": 2                     },                     {                        "key": "2",                        "doc_count": 1                     }                  ]               }            }         ]      }   }

So, inside of "aggregations", you'll have a list of "distinct_transactions". The key will be the transaction type, and the doc_count will represent the total transactions by all users.

Inside of each "distinct_transaction", there's "by_user_base", which is another terms agg (nested). Just like the transactions, the key will represent the user base name (or ID or whatever) and the doc_count will represent that unique user base's # of transactions.

Is that what you were looking to do? Hope I helped.


With the current version of ElasticSerach, there's the new significant_terms aggregation type, which can be used to calculate the affinity scores for my use case in a more simple way.

All the to me relevant metrics can then be calculated in one step, which is very nice!