elasticsearch analyzer - lowercase and whitespace tokenizer
i managed to write a custom analyzer and this works...
"settings":{ "analysis": { "analyzer": { "lowercasespaceanalyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase" ] } } }},"mappings": { "my_type" : { "properties" : { "title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [ "lowercase" ] } } }}
You have two options -
Simple Analyser
the simple analyser will probably meet your needs:
curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA' { "tokens" : [ { "token" : "some", "start_offset" : 0, "end_offset" : 4, "type" : "word", "position" : 1 }, { "token" : "data", "start_offset" : 5, "end_offset" : 9, "type" : "word", "position" : 2 } ]}
To use the simple analyser in your mapping:
{ "mappings": { "my_type" : { "properties" : { "title" : { "type" : "string", "analyzer" : "simple"} } } }}
Custom Analyser
Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.