Use multiple stemming languages with ElasticSearch Use multiple stemming languages with ElasticSearch elasticsearch elasticsearch

Use multiple stemming languages with ElasticSearch


So quick disclaimer, I'm not an expert in stemming/language morphology but since noone else is responding, here's my understanding. Also, most of my experience is along the lines of solr.

In order to be able to query with stemming against multiple languages with a single, mixed result set, you need to use a multilingual stemmer. I'm not sure what is available for elastisearch.

Trying to apply multiple stemmers designed for single languages to a single index will step on each other's toes and likely not produce expected results (stemming rules vary significantly depending on the language).

Having an index per language with respective stemmers works for queries with single language results. Trying to combine results from multiple queries against multiple indices is usually fairly problematic (you have to attempt to normalize relevancy and deal with paging).


You can create 2 separate indices and search on both ( or all ) at the same time. As long as fields of indices are the same you will get valid results.


Earlier this year Kiju Kim from the elasticsearch team published some good articles on the topic how to work with multiple languages on the elastic.co blog:

You can basically use multiple fields for your content - one for each language you want to support (Part 2) - each utilising language specific analyzers (Part 1). (Part 3) adds some optimisation to use language detection to populate the correct language field instead of all fields making use of an ingest pipeline (using an ingest plugin for language detection).