Analyze similarities in model data using Elasticsearch and Rails
There is a feature built exactly for this purpose in Elasticsearch called more_like_this
. The documentation for the mlt query goes into great details about how you can achieve exactly what you want to do.
The content you provide to the like
field will be analyzed and the most relevant terms for each field will be used to retrieve documents with as many of those relevant terms. If you have all your records stored in Elasticsearch, you can use the Multi GET syntax to specify a document already in your index as content of the like
field like this:
"like" : [ { "_index" : "model", "_type" : "model", "_id" : "1" } ]
Remember that you cannot use index aliases when using this syntax (so you'll have to do a document lookup first if you are not sure which index your document is currently residing in).
If you don't specify the fields
field, all fields in the source document will be used. My suggestion to avoid bad surprises, is to always specify the list of fields
you want your similar documents to match.
If you have non-textual fields that you want to match perfectly with the source document, you might want to consider using a bool
query, programmatically creating the filter
section to limit documents returned by the mlt
query to only a filtered subset of your entire index.
You can build these queries in Searchkick using the advanced search feature, manually specifying the body of search requests.
Read up on using More Like This Query. This is the query produced by product.similar()
. It operates only on text fields. If you also want to compare numeric or date fields, you'll have to incorporate these rules into a scoring script to do what you're asking.