analyzed or not_analyzed, what to choose analyzed or not_analyzed, what to choose elasticsearch elasticsearch

analyzed or not_analyzed, what to choose


I will to try to keep it simple, if you need more clarification just let me know and I'll elaborate a better answer.

the "analyzed" field is going to create a token using the analyzer that you had defined for that specific table in your mapping. if you are using the default analyzer (as you refer to something without especial characters lets say server[1-9]) using the default analyzer (alnum-lowercase word-braker(this is not the name just what it does basically)) is going to tokenize :

this -> HelloWorld123into -> token1:helloworld123ORthis -> Hello World 123into -> token1:hello && token2:world && token3:123

in this case if you do a search: HeLlO it will become -> "hello" and it will match this document because the token "hello" is there.

in the case of not_analized fields it doesnt apply any tokenizer at all, your token is your keyword so that being said:

this -> Hello World 123into -> token1:(Hello World 123)

if you search that field for "hello world 123"

is not going to match because is "case sensitive" (you can still use wildcards though (Hello*), lets address that in another time).

in a nutshell:

use "analyzed" fields for fields that you are going to search and you want elasticsearch to score them. example: titles that contain the word "jobs". query:"title:jobs".

doc1 : title:developer jobs in montrealdoc2 : title:java coder jobs in vancuverdoc3 : title:unix designer jobs in torontodoc4 : title:database manager vacancies in montreal

this is going to retrieve title1 title2 title3.

in those case "analyzed" fields is what you want.

if you know in advance what kind of data would be on that field and you're going to query exactly what you want then "not_analyzed" is what you want.

example:

get all the logs from server123.

query:"server:server123".

doc1 :server:server123,log:randomstring,date:01-jandoc2 :server:server986,log:randomstring,date:01-jandoc3 :server:server777,log:randomstring,date:01-jandoc4 :server:server666,log:randomstring,date:01-jandoc5 :server:server123,log:randomstring,date:02-jan

results only from server1 and server5.

and well i hope you get the point. as i said keep it simple is about what you need.

analyzed -> more space on disk (LOT MORE if the analyze filds are big). analyzed -> more time for indexation. analyzed -> better for matching documents.

not_analyzed -> less space on disk. not_analyzed -> less time for indexation. not_analyzed -> exact match for fields or using wildcards.

Regards,

Daniel