Search by Special Character using ElasticSearch Search by Special Character using ElasticSearch elasticsearch elasticsearch

Search by Special Character using ElasticSearch


Changing my answer based on the dicussion with @jeeten, also answer given by @Nishant would work but has following functional and non-functional issues:

Functional issue:

  1. Only ? and / special characters are allowed in the search while using that it would allow search on all the punctuations.

Non-functional issues:

  1. This would cause 3 fields to index in a different format, which would increase the index-size on disk, also puts more pressure on memory as Elasticsearch caches the inverted index for better search performance.

  2. Again, searching requires all the three different fields to search, and searching in more fields again causes performance issues.

  3. tokens are duplicated in three fields of the title field.

My solution

To address the above functional and non-functional requirements, I used [pattern_capture][1] token-filter to index only ? and /, it also uses "preserve_original": true, to support searches like foo?.

I am also indexing 2 fields and searching only on two fields to improve the performance.

Index def

{    "settings": {        "analysis": {            "filter": {                "splcharfilter": {                    "type": "pattern_capture",                    "preserve_original": true,                    "patterns": [                        "([?/])" --> extendable for future requirments.                    ]                }            },            "analyzer": {                "splcharanalyzer": {                    "tokenizer": "keyword",                    "filter": [                        "splcharfilter",                        "lowercase"                    ]                }            }        }    },    "mappings": {        "properties": {            "title": {                "type": "text",                "fields": {                    "splchar": {                        "type": "text",                        "analyzer": "splcharanalyzer"                    }                }            }        }    }}

Search query

{  "query": {    "query_string": {      "query": "\\?", --> change this according to queries.      "fields": ["title", "title.splchar"] --> noyte only 2 fields    }  }}

Search result

"hits": [            {                "_index": "pattern-capture",                "_type": "_doc",                "_id": "2",                "_score": 1.0341108,                "_source": {                    "title": "Are you ready to change the climate?"                }            },            {                "_index": "pattern-capture",                "_type": "_doc",                "_id": "4",                "_score": 1.0341108,                "_source": {                    "title": "What are the effects of direct public transfers on social solidarity?"                }            }        ]

P.S:- Not mentioning all the search queries and their output to make the answer short, but anybody can index and change the search queries and it works as expected.


Taking the following e.g. from chat as base:

Some example titles: title: Climate: The case of Nigerian agriculturetitle: Are you ready to change the climate?title: A literature review with a particular focus on the school stafftitle: What are the effects of direct public transfers on social solidarity?title: Community-Led Practical and/or Social Support Interventions for Adults Living at Home.If I search by only "?" then it should return the 2nd and 4th results.If I search by "/" then it should return only last record.Search by climate then 1st and 2nd results.Search by climate? then 1st, 2nd, and 4th results.

The solution would require to create analyzers for following cases:

  1. To search for special character. I'm considering these as punctuation e.g. /, ? etc.
  2. To search for keyword and special char. e.g. climate?
  3. To search for keyword. e.g. climate

For case 1 we'll use pattern tokenizer but instead of using pattern to split we'll use pattern to extract special characters as tokens and for this we set "group": 0 while defining the tokenizer. e.g. for text xyz a/b pq? tokens generated will be /, ?

For case 2 we'll create custom analyser with filter as lowercase (to make case insensitive) and tokenizer as whitespace (to retain special chars with keywords).e.g. for text How many? tokens generated will be how, many?

For case 3 we'll use standard analyser which is the default analyser.

Next step would be to create sub fields for title. title would be of type text and by default will have standard analyzer. This mapping property will have two sub-fields withSplChar of type text and analyzer created for case 2 (ci_whitespace), splChars of type text and analyzer created for case 1 (splchar)

Let's now see the above in action:

PUT test{  "settings": {    "analysis": {      "tokenizer": {        "splchar": {          "type": "pattern",          "pattern": "\\p{Punct}",          "group": 0        }      },      "analyzer": {        "splchar": {          "tokenizer": "splchar"        },        "ci_whitespace": {          "type": "custom",          "filter": [            "lowercase"          ],          "tokenizer": "whitespace"        }      }    }  },  "mappings": {    "properties": {      "title": {        "type": "text",        "fields": {          "withSplChar": {            "type": "text",            "analyzer": "ci_whitespace"          },          "splChars": {            "type": "text",            "analyzer": "splchar"          }        }      }    }  }}

Let's now index documents as in above example:

POST test/_bulk{"index":{"_id":"1"}}{"title":"Climate: The case of Nigerian agriculture"}{"index":{"_id":"2"}}{"title":"Are you ready to change the climate?"}{"index":{"_id":"3"}}{"title":"A literature review with a particular focus on the school staff"}{"index":{"_id":"4"}}{"title":"What are the effects of direct public transfers on social solidarity?"}{"index":{"_id":"5"}}{"title":"Community-Led Practical and/or Social Support Interventions for Adults Living at Home."}

Search for ?

"hits" : [  {    "_index" : "test",    "_type" : "_doc",    "_id" : "2",    "_score" : 0.8025915,    "_source" : {      "title" : "Are you ready to change the climate?"    }  },  {    "_index" : "test",    "_type" : "_doc",    "_id" : "4",    "_score" : 0.8025915,    "_source" : {      "title" : "What are the effects of direct public transfers on social solidarity?"    }  }]

Result:

   "hits" : [      {        "_index" : "test",        "_type" : "_doc",        "_id" : "2",        "_score" : 0.8025915,        "_source" : {          "title" : "Are you ready to change the climate?"        }      },      {        "_index" : "test",        "_type" : "_doc",        "_id" : "4",        "_score" : 0.8025915,        "_source" : {          "title" : "What are the effects of direct public transfers on social solidarity?"        }      }    ]

Search for climate

POST test/_search{  "query": {    "query_string": {      "query": "climate",      "fields": ["title", "title.withSplChar", "title.splChars"]    }  }}

Result:

"hits" : [  {    "_index" : "test",    "_type" : "_doc",    "_id" : "1",    "_score" : 1.0341107,    "_source" : {      "title" : "Climate: The case of Nigerian agriculture"    }  },  {    "_index" : "test",    "_type" : "_doc",    "_id" : "2",    "_score" : 0.98455274,    "_source" : {      "title" : "Are you ready to change the climate?"    }  }]

Search for climate?

POST test/_search{  "query": {    "query_string": {      "query": "climate\\?",      "fields": ["title", "title.withSplChar", "title.splChars"]    }  }}

Result:

"hits" : [  {    "_index" : "test",    "_type" : "_doc",    "_id" : "2",    "_score" : 1.5366155,    "_source" : {      "title" : "Are you ready to change the climate?"    }  },  {    "_index" : "test",    "_type" : "_doc",    "_id" : "1",    "_score" : 1.0341107,    "_source" : {      "title" : "Climate: The case of Nigerian agriculture"    }  },  {    "_index" : "test",    "_type" : "_doc",    "_id" : "4",    "_score" : 0.8025915,    "_source" : {      "title" : "What are the effects of direct public transfers on social solidarity?"    }  }]