Negative Lookahead Regex greed (why is .*? too greedy)

python regex perl regex-lookarounds

#!/usr/bin/perlsub test_re {    $arg    = $_[0];    $INSULTSTR = $_[1];    $INSULTSTR =~ /(?:^|\.\s*)(?:(?![^.]*?$arg[^.]*\.))([^.]*\.)/;    if ($1) {        print "neg-lookahead($arg) MATCHED: '$1'\n";    } else {        print "Unable to match: neg-lookahead($arg) in '$INSULTSTR'\n";    }}$INSULT = 'Yomama is ugly.  And, she smells like an wet dog.';test_re('Yomama', $INSULT);test_re('ugly', $INSULT);test_re('looks', $INSULT);test_re('And', $INSULT);test_re('And,', $INSULT);test_re('smells', $INSULT);test_re('dog', $INSULT);

Results:

neg-lookahead(Yomama) MATCHED: 'And, she smells like an wet dog.'neg-lookahead(ugly) MATCHED: 'And, she smells like an wet dog.'neg-lookahead(looks) MATCHED: 'Yomama is ugly.'neg-lookahead(And) MATCHED: 'Yomama is ugly.'neg-lookahead(And,) MATCHED: 'Yomama is ugly.'neg-lookahead(smells) MATCHED: 'Yomama is ugly.'neg-lookahead(dog) MATCHED: 'Yomama is ugly.'

python regex perl regex-lookarounds

If you're curious about what Perl is doing with a regex, you can run with the regex debugger:

perl -Dr -e '"A two. A one." =~ /(?![A-Z][^\.]*(?:two)[^\.]*\.)([A-Z][^\.]+\.)/; print ">$1<\n"'

which will generate much output for you to ponder. You will need a Perl built with -DDEBUGGING.

python regex perl regex-lookarounds

Your problem is that the regex engine will try as hard as possible to match (?![A-Z].*?$arg.*?\.), so with the "smells" case, it ends up matching the whole string. (The period in the middle is then included in one of the .*? constructs.) You should restrict the negative lookahead case to match only as much as the other case can:

Instead of:

(?:(?![A-Z].*?$arg.*?\.))([A-Z].*?\.)

Use:

(?:(?![A-Z][^.]*$arg[^.]*\.))([A-Z].*?\.)

Now, the negative lookahead cannot match more of the string than the other part can, since it must stop at the first period.

CodeHunter

Negative Lookahead Regex greed (why is .*? too greedy)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last