Escape elasticsearch special characters in PHP Escape elasticsearch special characters in PHP elasticsearch elasticsearch

Escape elasticsearch special characters in PHP


You can use preg_match with backreferences as stribizhev has noticed it (simpliest way) :

$string = "The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / Did it work?"; function escapeElasticReservedChars($string) {    $regex = "/[\\+\\-\\=\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\\"\\~\\*\\<\\>\\?\\:\\\\\\/]/";    return preg_replace($regex, addslashes('\\$0'), $string);}echo escapeElasticReservedChars($string);

or use preg_match_callback function to achieve that. Thank to the callback, you will be able to have the current match and edit it.

A callback that will be called and passed an array of matched elements in the subject string. The callback should return the replacement string. This is the callback signature:

Here is in action :

<?php $string = "The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / Did it work?"; function escapeElasticSearchReservedChars($string) {    $regex = "/[\\+\\-\\=\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\\"\\~\\*\\<\\>\\?\\:\\\\\\/]/";    $string = preg_replace_callback ($regex,         function ($matches) {             return "\\" . $matches[0];         }, $string);     return $string;}echo escapeElasticSearchReservedChars($string);

Output: The next chars should be escaped\: \+ \- \= \&\& \|\| \> \< \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/ Did it work\?


If anyone's looking for a slightly verbose (but readable!) solution:

public function escapeElasticsearchValue($searchValue){    $searchValue = str_replace('\\', '\\\\', $searchValue);    $searchValue = str_replace('*', '\\*', $searchValue);    $searchValue = str_replace('?', '\\?', $searchValue);    $searchValue = str_replace('+', '\\+', $searchValue);    $searchValue = str_replace('-', '\\-', $searchValue);    $searchValue = str_replace('&&', '\\&&', $searchValue);    $searchValue = str_replace('||', '\\||', $searchValue);    $searchValue = str_replace('!', '\\!', $searchValue);    $searchValue = str_replace('(', '\\(', $searchValue);    $searchValue = str_replace(')', '\\)', $searchValue);    $searchValue = str_replace('{', '\\{', $searchValue);    $searchValue = str_replace('}', '\\}', $searchValue);    $searchValue = str_replace('[', '\\[', $searchValue);    $searchValue = str_replace(']', '\\]', $searchValue);    $searchValue = str_replace('^', '\\^', $searchValue);    $searchValue = str_replace('~', '\\~', $searchValue);    $searchValue = str_replace(':', '\\:', $searchValue);    $searchValue = str_replace('"', '\\"', $searchValue);    $searchValue = str_replace('=', '\\=', $searchValue);    $searchValue = str_replace('/', '\\/', $searchValue);    // < and > can’t be escaped at all. The only way to prevent them from    // attempting to create a range query is to remove them from the query    // string entirely    $searchValue = str_replace('<', '', $searchValue);    $searchValue = str_replace('>', '', $searchValue);    return $searchValue;}


Full disclosure, I've never used elasticsearch and my advice is not from personal experience or even tested with elasticsearch. I am generating this advice from my knowledge of regular expressions and string manipulation skills. If someone identifies a vulnerability, I'll be happy to receive your comment.

My snippet:

  • first removes all occurrences of < and > in the string then
  • checks for a character in the list of single-occurrence reserved characters OR an ampersand or pipe which is immediately followed by the same character -- all of these qualifying characters are escaped with a backslash.

Code: (Demo)

$string = "To be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / triple ||| and split '&<&'"; echo escapeElasticSearchReservedChars($string);function escapeElasticSearchReservedChars(string $string): string{    return preg_replace(        [            '_[<>]+_',            '_[-+=!(){}[\]^"~*?:\\/\\\\]|&(?=&)|\|(?=\|)_',        ],        [            '',            '\\\\$0',        ],        $string    );}

Output:

To be escaped\: \+ \- \= \&& \||   \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/ triple \|\|| and split '\&&'

The reason that < and > are removed first is so that someone cannot try to hack the design of the replacement and try to pass in |>| which otherwise would prevent the appropriate escaping of two consecutive pipes (after the > was removed).