ElasticSearch - Searching with hyphens
The answer is really simple:
Quote from Igor Motov: Configuring the standard tokenizer
By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:
{ "_source":true, "query":{ "simple_query_string":{ "query":"u-1*", "analyze_wildcard":true, "default_operator":"AND" } }}
the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.
if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.
Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.
you can do it in the client front.
In your problem u-*
{ "query":{ "simple_query_string":{ "query":"u AND 1*", "analyze_wildcard":true } }}
t-sh*
{ "query":{ "simple_query_string":{ "query":"t AND sh*", "analyze_wildcard":true } } }
If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _
when indexing data.
For eg, O-000022334 should indexed as O_000022334.
When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.