How to write a recursive regex that matches nested parentheses? How to write a recursive regex that matches nested parentheses? php php

How to write a recursive regex that matches nested parentheses?


This pattern works:

$pattern = '~ \( (?: [^()]+ | (?R) )*+ \) ~x';

The content inside parenthesis is simply describe:

"all that is not parenthesis OR recursion (= other parenthesis)" x 0 or more times

If you want to catch all substrings inside parenthesis, you must put this pattern inside a lookahead to obtain all overlapping results:

$pattern = '~(?= ( \( (?: [^()]+ | (?1) )*+ \) ) )~x';preg_match_all($pattern, $subject, $matches);print_r($matches[1]);

Note that I have added a capturing group and I have replaced (?R) by (?1):

(?R) -> refers to the whole pattern (You can write (?0) too)(?1) -> refers to the first capturing group

What is this lookahead trick?

A subpattern inside a lookahead (or a lookbehind) doesn't match anything, it's only an assertion (a test). Thus, it allows to check the same substring several times.

If you display the whole pattern results (print_r($matches[0]);), you will see that all results are empty strings. The only way to obtain the substrings found by the subpattern inside the lookahead is to enclose the subpattern in a capturing group.

Note: the recursive subpattern can be improved like this:

\( [^()]*+ (?: (?R) [^()]* )*+ \)


When I found this answer I wasn't able to figure out how to modify the pattern to work with my own delimiters which where { and }. So my approach was to make it more generic.

Here is a script to generate the regex pattern with your own variable left and right delimiters.

$delimiter_wrap  = '~';$delimiter_left  = '{';/* put YOUR left delimiter here.  */$delimiter_right = '}';/* put YOUR right delimiter here. */$delimiter_left  = preg_quote( $delimiter_left,  $delimiter_wrap );$delimiter_right = preg_quote( $delimiter_right, $delimiter_wrap );$pattern         = $delimiter_wrap . $delimiter_left                 . '((?:[^' . $delimiter_left . $delimiter_right . ']++|(?R))*)'                 . $delimiter_right . $delimiter_wrap;/* Now you can use the generated pattern. */preg_match_all( $pattern, $subject, $matches );


The following code uses my Parser class (it's under CC-BY 3.0), it works on UTF-8 (thanks to my UTF8 class).

The way it works is by using a recursive function to iterate over the string. It will call itself each time it finds a (. It will also detect missmatched pairs when it reaches the end of the string without finding the corresponding ).

Also, this code takes a $callback parameter you can use to process each piece it finds. The callback recieves two parameters: 1) the string, and 2) the level (0 = deepest). Whatever the callback returns will be replaced in the contents of the string (this changes are visible at callback of higher level).

Note: the code does not includes type checks.

Non-recursive part:

function ParseParenthesis(/*string*/ $string, /*function*/ $callback){    //Create a new parser object    $parser = new Parser($string);    //Call the recursive part    $result = ParseParenthesisFragment($parser, $callback);    if ($result['close'])    {        return $result['contents'];    }    else    {        //UNEXPECTED END OF STRING        // throw new Exception('UNEXPECTED END OF STRING');        return false;    }}

Recursive part:

function ParseParenthesisFragment(/*parser*/ $parser, /*function*/ $callback){    $contents = '';    $level = 0;    while(true)    {        $parenthesis = array('(', ')');        // Jump to the first/next "(" or ")"        $new = $parser->ConsumeUntil($parenthesis);        $parser->Flush(); //<- Flush is just an optimization        // Append what we got so far        $contents .= $new;        // Read the "(" or ")"        $element = $parser->Consume($parenthesis);        if ($element === '(') //If we found "("        {            //OPEN            $result = ParseParenthesisFragment($parser, $callback);            if ($result['close'])            {                // It was closed, all ok                // Update the level of this iteration                $newLevel = $result['level'] + 1;                if ($newLevel > $level)                {                    $level = $newLevel;                }                // Call the callback                $new = call_user_func                (                    $callback,                    $result['contents'],                    $level                );                // Append what we got                $contents .= $new;            }            else            {                //UNEXPECTED END OF STRING                // Don't call the callback for missmatched parenthesis                // just append and return                return array                (                    'close' => false,                    'contents' => $contents.$result['contents']                );            }        }        else if ($element == ')') //If we found a ")"        {            //CLOSE            return array            (                'close' => true,                'contents' => $contents,                'level' => $level            );        }        else if ($result['status'] === null)        {            //END OF STRING            return array            (                'close' => false,                'contents' => $contents            );        }    }}