ElasticSearch - Searching with hyphens ElasticSearch - Searching with hyphens elasticsearch elasticsearch

ElasticSearch - Searching with hyphens


The answer is really simple:

Quote from Igor Motov: Configuring the standard tokenizer

By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:

{  "_source":true,  "query":{    "simple_query_string":{      "query":"u-1*",      "analyze_wildcard":true,      "default_operator":"AND"    }  }}


the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.

if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.

Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.

you can do it in the client front.

In your problem u-*

{  "query":{    "simple_query_string":{      "query":"u AND 1*",      "analyze_wildcard":true    }  }}

t-sh*

  {      "query":{        "simple_query_string":{          "query":"t AND sh*",          "analyze_wildcard":true        }      }    }


If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _ when indexing data.

For eg, O-000022334 should indexed as O_000022334.

When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.