how to configure the synonyms_path in elasticsearch
I don't know, if your problem is because you defined bad the synonyms for "bar". As you said you are pretty new I'm going to put an example similar to yours that works. I want to show how elasticsearch deal with synonyms at search time and at index time. Hope it helps.
First thing create the synonym file:
foo => foo bar, baz
Now I create the index with the particular settings you are trying to test:
curl -XPUT 'http://localhost:9200/test/' -d '{ "settings": { "index": { "analysis": { "analyzer": { "synonym": { "tokenizer": "whitespace", "filter": ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "synonyms.txt" } } } } }, "mappings": { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "standard", "type" : "string" }, "text_3" : { "type" : "string", "search_analyzer" : "synonym", "index_analyzer" : "standard" } } } }}'
Note that synonyms.txt must be in the same directory that the configuration file since that path is relative to the config dir.
Now index a doc:
curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat"}'
Now the searches
Searching in field text_1
curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] }}
You get the document because baz is synonym of foo and at index time foo is expanded with its synonyms
Searching in field text_2
curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
result:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] }}
I don't get hits because I didn't expand synonyms while indexing (standard analyzer). And, since I'm searching baz and baz is not in the text, I don't get any result.
Searching in field text_3
curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.15342641, "hits": [ { "_index": "test", "_type": "test", "_id": "1", "_score": 0.15342641, "_source": { "text_3": "baz dog cat", "text_2": "foo dog cat", "text_1": "foo dog cat" } } ] }}
Note: text_3 is "baz dog cat"
text_3 was indexes without expanding synonyms. As I'm searching for foo, which have "baz" as one of the synonyms I get the result.
If you want to debug you can use _analyze
endpoint for example:
curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
result:
{ "tokens": [ { "token": "foo", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 1 }, { "token": "baz", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 1 }, { "token": "bar", "start_offset": 0, "end_offset": 3, "type": "SYNONYM", "position": 2 } ]}