Analyze similarities in model data using Elasticsearch and Rails Analyze similarities in model data using Elasticsearch and Rails elasticsearch elasticsearch

Analyze similarities in model data using Elasticsearch and Rails


There is a feature built exactly for this purpose in Elasticsearch called more_like_this. The documentation for the mlt query goes into great details about how you can achieve exactly what you want to do.

The content you provide to the like field will be analyzed and the most relevant terms for each field will be used to retrieve documents with as many of those relevant terms. If you have all your records stored in Elasticsearch, you can use the Multi GET syntax to specify a document already in your index as content of the like field like this:

    "like" : [      {        "_index" : "model",        "_type" : "model",        "_id" : "1"      }    ]

Remember that you cannot use index aliases when using this syntax (so you'll have to do a document lookup first if you are not sure which index your document is currently residing in).

If you don't specify the fields field, all fields in the source document will be used. My suggestion to avoid bad surprises, is to always specify the list of fields you want your similar documents to match.

If you have non-textual fields that you want to match perfectly with the source document, you might want to consider using a bool query, programmatically creating the filter section to limit documents returned by the mlt query to only a filtered subset of your entire index.

You can build these queries in Searchkick using the advanced search feature, manually specifying the body of search requests.


Read up on using More Like This Query. This is the query produced by product.similar(). It operates only on text fields. If you also want to compare numeric or date fields, you'll have to incorporate these rules into a scoring script to do what you're asking.