Does ElasticSearch support Unicode / Chinese? Does ElasticSearch support Unicode / Chinese? elasticsearch elasticsearch

Does ElasticSearch support Unicode / Chinese?


From the ElasticSearch docs about term query:

Matches documents that have fields that contain a term (not analyzed).

The name field is analyzed by default, so it can not be found by a term query (only finds not analyzed fields). You can try it and index another document with a different name (not Chinese) and it can also not be found by the term query. If you are now wondering why following search query return results though:

curl -XGET 'http://localhost:9200/test/test/_search?pretty=1' -d '{"query" : {"term" : { "name" : "好" }}}'

Its because each token is a not analyzed term for that matter. If you would index a document with the name "你好吗", you would also not find documents containing "好吗" or "你好", but you could find documents containing "你", "好" or "吗" with a term query.

For Chinese you might need to pay special attention to the analyzer used. For me the standard analyzer seems good enough though (tokenize Chinese phrases on character by character basis, rather than space).


The default analyser is not suitable for asian languages. Try using an Analyzer like this:https://github.com/elasticsearch/elasticsearch-analysis-smartcn