ElasticSearch and Regex queries ElasticSearch and Regex queries elasticsearch elasticsearch

ElasticSearch and Regex queries


You should read Elasticsearch's Regexp Query documentation carefully, you are making some incorrect assumptions about how the regexp query works.

Probably the most important thing to understand here is what the string you are trying to match is. You are trying to match terms, not the entire string. If this is being indexed with StandardAnalyzer, as I would suspect, your dates will be separated into multiple terms:

  • "01/01/1901" becomes tokens "01", "01" and "1901"
  • "01 01 1901" becomes tokens "01", "01" and "1901"
  • "01-01-1901" becomes tokens "01", "01" and "1901"
  • "01.01.1901" actually will be a single token: "01.01.1901" (Due to decimal handling, see UAX #29)

You can only match a single, whole token with a regexp query.

Elasticsearch (and lucene) don't support full Perl-compatible regex syntax.

In your first couple of examples, you are using anchors, ^ and $. These are not supported. Your regex must match the entire token to get a match anyway, so anchors are not needed.

Shorthand character classes like \d (or \\d) are also not supported. Instead of \\d\\d, use [0-9]{2}.

In your last attempt, you are using /{regex}/g, which is also not supported. Since your regex needs to match the whole string, the global flag wouldn't even make sense in context. Unless you are using a query parser which uses them to denote a regex, your regex should not be wrapped in slashes.

(By the way: How did this one validate on regex101? You have a bunch of unescaped /s. It complains at me when I try it.)


To support this sort of query on such an analyzed field, you'll probably want to look to span queries, and particularly Span Multiterm and Span Near. Perhaps something like:

{    "span_near" : {        "clauses" : [            { "span_multi" : {                 "match": {                    "regexp": {"content": "0[1-9]|[12][0-9]|3[01]"}                }            }},            { "span_multi" : {                 "match": {                    "regexp": {"content": "0[1-9]|1[012]"}                }            }},            { "span_multi" : {                 "match": {                    "regexp": {"content": "(19|20)[0-9]{2}"}                 }            }}        ],        "slop" : 0,        "in_order" : true    }}