How to prevent Facet Terms from tokenizing How to prevent Facet Terms from tokenizing elasticsearch elasticsearch

How to prevent Facet Terms from tokenizing


If reindexing is an option, it would be the best to change mapping and mark this fields as not_analyzed

"your_field" : { "type": "string", "index" : "not_analyzed" }

You can use multi field type if keeping an analyzed version of the field is desired:

"your_field" : {  "type" : "multi_field",    "fields" : {      "your_field" : {"type" : "string", "index" : "analyzed"},      "untouched" : {"type" : "string", "index" : "not_analyzed"}  }}

This way, you can continue using your_field in the queries, while running facet searches using your_field.untouched.

Alternatively, if this field is stored, you can use a script field facet instead:

"facets" : {  "term" : {    "terms" : {      "script_field" : "_fields.your_field.value"    }  }}

As the last resort, if this field is not stored, but record source is stored in the index, you can try this:

"facets" : {  "term" : {    "terms" : {      "script_field" : "_source.your_field"    }  }}

The first solution is the most efficient. The last solution is the least efficient and may take a lot of time on a large index.


Wow, I also got this same issue today while term aggregating in the recent elastic-search. After googling and some partial understanding, found how this geeky indexing works(which is very simple).

Queries can find only terms that actually exist in the inverted index

When you index the following string

"WEB-MISC /etc/passwd"

it will be passed to an analyzer. The analyzer might tokenize it into

"WEB", "MISC", "etc" and "passwd" 

with its position details. And this tokens might filtered to lowercase such as

"web", "misc", "etc" and "passwd"

So, after indexing,the search query can see the above 4 only. not the complete word "WEB-MISC /etc/passwd". For your requirement the following are my options you can use

1.Change the Default Analyzer used by elasticsearch([link][1])2.If it is not need, just TurnOff the analyzer by setting 'not_analyzed' for the fields you need3.To convert the already indexed data searchable, re-indexing is the only option


I have briefly explained this problem and proposed two solutions here.I have talked about multiple approaches here.One is use of not_analyzed to preserve the string as it is. But then as it has the drawback of being case insensitive , a better approach would be use keyword tokenizer + lowercase filter