Significant Terms Aggregation of "flat" structures Significant Terms Aggregation of "flat" structures elasticsearch elasticsearch

Significant Terms Aggregation of "flat" structures


It sounds like you're trying to build an item-based recommender. Apache Mahout has tools to help with collaborative filtering (formerly the Taste project).

There is also a Taste plugin for Elasticsearch 1.5.x which I believe can work with data like yours to produce item-based recommendations.

(Note: This plugin uses Rivers which were deprecated in Elasticsearch 1.5, so I'd check with the authors about plans to support more recent versions of Elasticsearch before adopting this suggestion.)


Since I don't have the amount of data that you do, try this:

  1. get the list of itemIds for bundles that contain a certain productId that you want to find "stuff" for:
{  "query": {    "filtered": {      "filter": {        "term": {          "productId": 1234        }      }    }  },  "fields": [    "itemId"  ]}

Then

  1. using this list create this query:
GET /sales/sales/_search?search_type=count{  "query": {    "filtered": {      "filter": {        "terms": {          "itemId": [1,2,3,4,5,6,7,11]        }      }    }  },  "aggs": {    "most_sig": {      "significant_terms": {        "field": "productId",        "size": 0      }    }  }}


If I understand correctly you have a doc per order line item. What you want is a single doc per order. The Order doc should have an array of productIds (or an array of line item objects that each include a productId field).

That way when you query for orders containing product X the sig_terms aggregation should find product Y is found to be uncommonly common in these orders.