Partial and Full Phrase Match Partial and Full Phrase Match elasticsearch elasticsearch

Partial and Full Phrase Match


It sounds like you'd like to perform keyphrase extraction from your documents using a controlled vocabulary (your dictionary of industry terms and phrases).

[Italicized terms above to help you find related answers on SO and Google]


This level of analysis takes you a bit out of the search stack into the natural-language processing stack. Since NLP tends to be resource-intensive, it tends to take place offline, or in the case of search-applications, at index-time.

To implement this, you'd:

  1. Integrate a keyphrase extraction tool, into your search-indexing code to generate a list of recognized key phrases for each document.
  2. Index those key phrases as shingles into a new Elasticsearch field.
  3. Include this shingled keyphrase field in the list of fields searched at query-time — most likely with a score boost.

For a quick win tool to help you with controlled keyphrase extraction, check out KEA (written in java).

(You could also probably write your own, but if you're also hoping to extract uncontrolled key phrases (not in dictionary) as well, a trained extractor will serve you better. More tools here.)