stringr, str_extract: how to do positive lookbehind?
You need to use str_match
since the pattern for "lookbehind" is a literal, and you just do not know the number of whitespaces:
> result_1 <- str_match(myStrings,"MFG\\s*:\\s*(\\w+)")> result_1[,2]##[1] "acme" NA "initech"
The results you need will be in the second column.
Note the str_extract
cannot be used here since that function drops the captured values.
And a bonus: the lookbehind is not infinite-width, but it is constrained-width in ICU regex. So, this will also work:
> result_1 <- str_extract(myStrings,"(?<=MFG\\s{0,100}:\\s{0,100})\\w+")> result_1[1] "acme" NA "initech"
We can use a regex lookaround. The lookbehind would take only exact matches.
str_extract(myStrings, "(?<=MFG:\\s)\\w+")#[1] "acme" NA "initech"
I wrote the code in python using lookbehind. if the parser find MFG: then it will grab the next word
txt="MFG: acme, something else, MFG: initech"pattern=r"(?<=MFG\:)\s+\w+"matches=re.findall(pattern,txt)for match in matches: print(match)
output:
acme initech