Elasticsearch wildcard search on not_analyzed field Elasticsearch wildcard search on not_analyzed field elasticsearch elasticsearch

Elasticsearch wildcard search on not_analyzed field


There's a couple of things going wrong here.

First, you are saying that you don't want terms analyzed index time. Then, there's an analyzer configured (that's used search time) that generates incompatible terms. (They are lowercased)

By default, all terms end up in the _all-field with the standard analyzer. That is where you end up searching. Since it tokenizes on "-", you end up with an OR of "*SVF" and "1*".

Try to do a terms facet on _all and on name to see what's going on.

Here's a runnable Play and gist: https://www.found.no/play/gist/3e5fcb1b4c41cfc20226 (https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226)

You need to make sure the terms you index is compatible with what you search for. You probably want to disable _all, since it can muddy what's going on.

#!/bin/bashexport ELASTICSEARCH_ENDPOINT="http://localhost:9200"# Create indexescurl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{    "settings": {        "analysis": {            "text": [                "SVF-123",                "SVF-234"            ],            "analyzer": {                "analyzer_keyword": {                    "type": "custom",                    "tokenizer": "keyword",                    "filter": [                        "lowercase"                    ]                }            }        }    },    "mappings": {        "type": {            "properties": {                "name": {                    "type": "string",                    "index": "not_analyzed",                    "analyzer": "analyzer_keyword"                }            }        }    }}'# Index documentscurl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '{"index":{"_index":"play","_type":"type"}}{"name":"SVF-123"}{"index":{"_index":"play","_type":"type"}}{"name":"SVF-234"}'# Do searches# See all the generated terms.curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '{    "facets": {        "name": {            "terms": {                "field": "name"            }        },        "_all": {            "terms": {                "field": "_all"            }        }    }}'# Analyzed, so no matchcurl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '{    "query": {        "match": {            "name": {                "query": "SVF-123"            }        }    }}'# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '{    "query": {        "term": {            "name": {                "value": "SVF-123"            }        }    }}'curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '{    "query": {        "term": {            "_all": {                "value": "svf"            }        }    }}'


My solution adventure

I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:

1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;

When user started to search a keyword like SVF-1, system run this query:

{    "query": {        "filtered" : {            "query" : {                "query_string" : {                    "analyze_wildcard": true,                    "query": "*SVF-1*"                }            }        }    }}

and results;

SVF-123SVF-234

This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed

{  "mappings":{     "product":{        "properties":{           "name":{              "type":"string",              "index": "not_analyzed"           },           "site":{              "type":"string",              "index": "not_analyzed"           }         }     }  }}

but my problem continued.

2.) I wanted to try another way after lots of research. Decided to use wildcard query.My query is;

{    "query": {        "wildcard" : {            "name" : {                "value" : *SVF-1*"             }          }      },            "filter":{                    "term": {"site":"pro_en_GB"}            }    }}

This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.

3.) I have changed my document structure to;

{  "mappings":{     "product":{        "properties":{           "name":{              "type":"string",              "index": "not_analyzed"           },           "nameLowerCase":{              "type":"string",              "index": "not_analyzed"           }           "site":{              "type":"string",              "index": "not_analyzed"           }         }     }  }}

I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;

{    name: "SVF-123",    nameLowerCase: "svf-123",    site: "pro_en_GB"}

Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.

Final version of my query is;

{    "query": {        "wildcard" : {            "nameLowerCase" : {                "value" : "*svf-1*"             }          }      },            "filter":{                    "term": {"site":"pro_en_GB"}            }    }}

Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.

Lots of thanks to @Alex Brasetvik for his detailed explanation and effort


Adding to Hüseyin answer, we can use AND as the default operator. So SVF and 1* will be joined using AND operator, therefore giving us the correct results.

"query": {    "filtered" : {        "query" : {            "query_string" : {                "default_operator": "AND",                "analyze_wildcard": true,                "query": "*SVF-1*"            }        }    }}