Significant Terms Aggregation of "flat" structures
It sounds like you're trying to build an item-based recommender. Apache Mahout has tools to help with collaborative filtering (formerly the Taste project).
There is also a Taste plugin for Elasticsearch 1.5.x which I believe can work with data like yours to produce item-based recommendations.
(Note: This plugin uses Rivers which were deprecated in Elasticsearch 1.5, so I'd check with the authors about plans to support more recent versions of Elasticsearch before adopting this suggestion.)
Since I don't have the amount of data that you do, try this:
- get the list of
itemId
s for bundles that contain a certainproductId
that you want to find "stuff" for:
{ "query": { "filtered": { "filter": { "term": { "productId": 1234 } } } }, "fields": [ "itemId" ]}
Then
- using this list create this query:
GET /sales/sales/_search?search_type=count{ "query": { "filtered": { "filter": { "terms": { "itemId": [1,2,3,4,5,6,7,11] } } } }, "aggs": { "most_sig": { "significant_terms": { "field": "productId", "size": 0 } } }}
If I understand correctly you have a doc per order line item. What you want is a single doc per order. The Order doc should have an array of productIds (or an array of line item objects that each include a productId field).
That way when you query for orders containing product X the sig_terms aggregation should find product Y is found to be uncommonly common in these orders.