stringr, str_extract: how to do positive lookbehind? stringr, str_extract: how to do positive lookbehind? r r

stringr, str_extract: how to do positive lookbehind?


You need to use str_match since the pattern for "lookbehind" is a literal, and you just do not know the number of whitespaces:

> result_1  <- str_match(myStrings,"MFG\\s*:\\s*(\\w+)")> result_1[,2]##[1] "acme"    NA        "initech"

The results you need will be in the second column.

Note the str_extract cannot be used here since that function drops the captured values.

And a bonus: the lookbehind is not infinite-width, but it is constrained-width in ICU regex. So, this will also work:

> result_1  <- str_extract(myStrings,"(?<=MFG\\s{0,100}:\\s{0,100})\\w+")> result_1[1] "acme"    NA        "initech"


We can use a regex lookaround. The lookbehind would take only exact matches.

str_extract(myStrings, "(?<=MFG:\\s)\\w+")#[1] "acme"    NA        "initech"


I wrote the code in python using lookbehind. if the parser find MFG: then it will grab the next word

txt="MFG: acme, something else, MFG: initech"pattern=r"(?<=MFG\:)\s+\w+"matches=re.findall(pattern,txt)for match in matches:   print(match)

output:

 acme initech