Retrieve analyzed tokens from ElasticSearch documents Retrieve analyzed tokens from ElasticSearch documents elasticsearch elasticsearch

Retrieve analyzed tokens from ElasticSearch documents


This question is a litte old, but maybe I think an additional answer is necessary.

With ElasticSearch 1.0.0 the Term Vector API was added which gives you direct access to the tokens ElasticSearch stores under the hood on per document basis. The API docs are not very clear on this (only mentioned in the example), but in order to use the API you have to first indicate in your mapping definition that you want to store term vectors with the term_vector property on each field.


Have a look at this other answer: elasticsearch - Return the tokens of a field. Unfortunately it requires to reanalyze on the fly the content of your field using the script provided.
It should be possible to write a plugin to expose this feature. The idea would be to add two endpoints to:

  • allow to read the lucene TermsEnum like the solr TermsComponent does, useful to make auto-suggestions too. Note that it wouldn't be per document, just every term on the index with term frequency and document frequency (potentially expensive with a lot of unique terms)
  • allow to read the term vectors if enabled, like the solr TermVectorComponent does. This would be per document but requires to store the term vectors (you can configure it in your mapping) and allows also to retrieve positions and offsets if enabled.


You may want to use scripting, however your server should have the scripting enabled.

curl 'http://localhost:9200/your_index/your_type/_search?pretty=true' -d '{    "query" : {        "match_all" : { }    },    "script_fields": {        "terms" : {            "script": "doc[field].values",            "params": {                "field": "field_x.field_y"            }        }    }}'

The default setting for allowing the script depends on the elastic search version, so please check that out from the official documentation.