Finding matching portions of two strings in PHP Finding matching portions of two strings in PHP php php

Finding matching portions of two strings in PHP


Assuming your strings are $a and $b, respectively, you can use this:

$a = 'http://2.2.2.2/~machinehost/deployment_folder/';$b = '/~machinehost/deployment_folder/users/bob/settings';$len_a = strlen($a);$len_b = strlen($b);for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)    if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))        break;$result = $a.substr($b, $len_b - $p);echo $result;

This result is http://2.2.2.2/~machinehost/deployment_folder/users/bob/settings.


Finding the longest common match can also be done using regex.

The below function will take two strings, use one to create a regex, and execute it against the other.

/** * Determine the longest common match within two strings * * @param string $str1 * @param string $str2 Two strings in any order. * @param boolean $case_sensitive Set to true to force * case sensitivity. Default: false (case insensitive). * @return string The longest string - first match. */function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {    // First check to see if one string is the same as the other.    if ( $str1 === $str2 ) return $str1;    if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;    // We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,    $delimiter = '#';    // We'll find the shortest string and use that to check substrings and create our regex.    $l1 = strlen( $str1 );    $l2 = strlen( $str2 );    $str = $l1 <= $l2 ? $str1 : $str2;    $str2 = $l1 <= $l2 ? $str2 : $str1;    $l = min( $l1, $l2 );    // Next check to see if one string is a substring of the other.    if ( $case_sensitive ) {        if ( strpos( $str2, $str ) !== false ) {            return $str;        }    }    else {        if ( stripos( $str2, $str ) !== false ) {            return $str;        }    }    // Regex for each character will be of the format (?:a(?=b))?    // We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?    $reg = $delimiter;    for ( $i = 0; $i < $l; $i++ ) {        $a = preg_quote( $str[ $i ], $delimiter );        $b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;        $reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );    }    $reg .= $delimiter;    if ( ! $case_sensitive ) {        $reg .= 'i';    }    // Resulting example regex from a string 'abbc':    // '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';    // Perform our regex on the remaining string    $str = $l1 <= $l2 ? $str2 : $str1;    if ( preg_match_all( $reg, $str, $matches ) ) {        // $matches is an array with a single array with all the matches.        return array_reduce( $matches[0], function( $a, $b ) {            $al = strlen( $a );            $bl = strlen( $b );            // Return the longest string, as long as it's not a single character.            return $al >= $bl || $bl <= 1 ? $a : $b;        }, '' );    }    // No match - Return an empty string.    return '';}

It'll generate a regex using the shorter of the two strings, although performance will most likely be the same either way. It may incorrectly match strings with recurring substrings, and we're limited to matching strings of two characters or more, unless they are equal or one is a substring of the other. For Instance:

// Works as intended.get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';// Returns incorrect substring based on string length and recurring substrings.get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';// Does not return any matches, as all recurring strings are only a single character long.get_longest_common_subsequence( 'abc', 'ace' ) === '';// One of the strings is a substring of the other.get_longest_common_subsequence( 'abc', 'a' ) === 'a';

Regardless, it functions using an alternate method and the regex can be refined to tackle additional situations.


I'm not sure to understand your full request, but the idea is:

Let A be your URL and B your "/~machinehost/deployment_folder/users/bob/settings"

  • search B in A -> you get an index i (where i is the position of the first / of B in A)
  • let l = length(A)
  • You need to cut B from (l-i) to length(B) to grab the last part of B (/users/bob/settings)

I have not tested yet, but if you really need, I can help you make this brilliant (ironical) solution work.

Note that it may be possible with regular expressions like

$pattern = "$B(.*?)"$res = array();preg_match_all($pattern, $A, $res);

Edit: I think your last comment invalidates my response. But what you want is finding substrings. So you can first start with a heavy algorithm trying to find B[1:i] in A for i in {2, length(B)} and then use some dynamic programming stuffs.