Regexp finding longest common prefix of two strings Regexp finding longest common prefix of two strings ruby ruby

Regexp finding longest common prefix of two strings


If there's some character that neither string contains —, say, \0 — you could write

"$first\0$second" =~ m/^(.*).*\0\1/s;

and the longest common prefix would be saved as $1.


Edited to add: This is obviously very inefficient. I think that if efficiency is a concern, then this simply isn't the approach we should be using; but we can at least improve it by changing .* to [^\0]* to prevent useless greediness that will just have to be backtracked again, and wrapping the second [^\0]* in (?>…) to prevent backtracking that can't help. This:

"$first\0$second" =~ m/^([^\0]*)(?>[^\0]*)\0\1/s;

This will yield the same result, but much more efficiently. (But still not nearly as efficiently as a straightforward non–regex-based approach. If the strings both have length n, I'd expect its worst case to take at least O(n2) time, whereas the straightforward non–regex-based approach would take O(n) time in its worst case.)


Here's a Python one-liner:

>>> a = 'stackoverflow'>>> b = 'stackofpancakes'>>> a[:[x[0]==x[1] for x in zip(a,b)].index(0)]0: 'stacko'>>> a = 'nothing in'>>> b = 'common'>>> a[:[x[0]==x[1] for x in zip(a,b)].index(0)]1: ''>>> 


Here's one fairly efficient way which uses a regexp. The code is in Perl, but the principle should be adaptable to other languages:

my $xor = "$first" ^ "$second";    # quotes force string xor even for numbers$xor =~ /^\0*/;                    # match leading null charactersmy $common_prefix_length = $+[0];  # get length of match

(A subtlety worth noting is that Perl's string XOR operator (^) in effect pads the shorter string with nulls to match the length of the longer one. Thus, if the strings might contain null characters, and if the shorter string happens to be a prefix of the longer one, the common prefix length calculated with this code might exceed the length of the shorter string.)