Kibana using regex doesn't work as expected Kibana using regex doesn't work as expected elasticsearch elasticsearch

Kibana using regex doesn't work as expected


Tested on version 6.2.4

Added the below index with mapping as shown below

    PUT test{  "mappings": {    "_doc": {      "properties": {        "message": {          "type": "text"        },        "message2": {          "type": "keyword"        }      }    }  }}

Added 2 documents to the index as below

PUT test/_doc/1?refresh{  "message": "hellothere",  "message2":"2018-10-20 23:10:21,068 GMT*8*PEGA0001*8087*1000*8ce767fc2b32*NA*NA*HKVZWM7PHSLMGR3ZXP4OEKEBG3DFFS30K*Test.User*Case-CAS-FS-Work-Svc*Solution:01.03.01*00cb8b6febb234d359369e54a60a865f*Y*3*HKVZWM7PHSLMGR3ZXP4OEKEBG3DFFS30K*35*http-apr-8080-exec-26*STANDARD*com.pega.pegarules.session.internal.engineinterface.service.HttpAPI*192.168.99.100|192.168.99.1*Activity=Pega-UI-CommandPalette.pzGetPaletteOptions*Rule-Obj-Activity:pzGetPaletteOptions*PEGA-UI-COMMANDPALETTE PZGETPALETTEOPTIONS #20161123T194957.445 GMT Step: 2 Circum: 0*NA*****pxRDBIOElapsed=0.03;pxRDBIOCount=4;pxRunStreamCount=811;pxTotalReqCPU=2.81;pxRunModelCount=270;pxOutputBytes=584,268;pxRunWhenCount=1,904;pxDeclarativePageLoadElapsed=6.84;pxRulesExecuted=3,471;pxOtherCount=314;pxDBInputBytes=3,553,909;pxTotalReqTime=8.09;pxActivityCount=967;pxAlertCount=1;pxOtherFromCacheCount=66;pxInteractions=1;pxLegacyRuleAPIUsedCount=1;pxRuleCount=254;pxInputBytes=101;pxRuleIOElapsed=0.09;pxRulesUsed=4,262;pxDeclarativePageLoadCount=6;pxRuleFromCacheCount=254;pxOtherIOElapsed=0.99;pxTrackedPropertyChangesCount=106;pxOtherIOCount=255;*NA*NA*NA*NA*NA*pyActivity=Pega-UI-CommandPalette.pzGetPaletteOptions;primaryPageClass=Data-Portal-DesignerStudio;*HTTP interaction has exceeded the elapsed time alert threshold of 1000 ms: 8088 ms.*" } PUT test/_doc/2?refresh{  "message": "2018-10-20 23:10:21,068 GMT*8*PEGA0001*8087*1000*8ce767fc2b32*NA*NA*HKVZWM7PHSLMGR3ZXP4OEKEBG3DFFS30K*Test.User*Case-CAS-FS-Work-Svc*Solution:01.03.01*00cb8b6febb234d359369e54a60a865f*Y*3*HKVZWM7PHSLMGR3ZXP4OEKEBG3DFFS30K*35*http-apr-8080-exec-26*STANDARD*com.pega.pegarules.session.internal.engineinterface.service.HttpAPI*192.168.99.100|192.168.99.1*Activity=Pega-UI-CommandPalette.pzGetPaletteOptions*Rule-Obj-Activity:pzGetPaletteOptions*PEGA-UI-COMMANDPALETTE PZGETPALETTEOPTIONS #20161123T194957.445 GMT Step: 2 Circum: 0*NA*****pxRDBIOElapsed=0.03;pxRDBIOCount=4;pxRunStreamCount=811;pxTotalReqCPU=2.81;pxRunModelCount=270;pxOutputBytes=584,268;pxRunWhenCount=1,904;pxDeclarativePageLoadElapsed=6.84;pxRulesExecuted=3,471;pxOtherCount=314;pxDBInputBytes=3,553,909;pxTotalReqTime=8.09;pxActivityCount=967;pxAlertCount=1;pxOtherFromCacheCount=66;pxInteractions=1;pxLegacyRuleAPIUsedCount=1;pxRuleCount=254;pxInputBytes=101;pxRuleIOElapsed=0.09;pxRulesUsed=4,262;pxDeclarativePageLoadCount=6;pxRuleFromCacheCount=254;pxOtherIOElapsed=0.99;pxTrackedPropertyChangesCount=106;pxOtherIOCount=255;*NA*NA*NA*NA*NA*pyActivity=Pega-UI-CommandPalette.pzGetPaletteOptions;primaryPageClass=Data-Portal-DesignerStudio;*HTTP interaction has exceeded the elapsed time alert threshold of 1000 ms: 8088 ms.*",  "message2":"hellothere" }

Search for message2: /192.168.99.[0-9]{3}/ results in 0 results

Search for message: /192.168.99.[0-9]{3}/ results in doc#2

Search for message2: /.*192.168.99.[0-9]{3}.*/ results in doc#1

Search for message: /pegarules.session/ results in 0 results.

But search for message: /.*pegarules.session.*/ results in doc#1because inverted index has "token": "com.pega.pegarules.session.internal.engineinterface.service.httpapi"

Search for message2: /.*pegarules.session.*/ results in doc#1`

So, the message filed (type text) is tokenized and regex search for wild card token patterns is returning results.

Where as, the message2 field (type keyword) is not analyzed and is put in to the inverted index as is. Regex search for a pattern like 192.168.99.[0-9]{3} is not returning anything unless we add greedy quantifier (.*)

The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators so it may not work and match results like regular regex.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax