stringr str_extract capture group capturing everything stringr str_extract capture group capturing everything r r

stringr str_extract capture group capturing everything


The capture group is irrelevant in this case. The function str_extract will return the whole match including characters before and after the capture group.

You have to work with lookbehind and lookahead instead. Their length is zero.

library(stringr)str_extract(string = 'X2015.XML.Outgoing.pounds..millions.',            pattern = '(?<=X)\\d{4}(?=\\.)')# [1] "2015"

This regex matches four consecutive digits that are preceded by an X and followed by a ..


Alternatively, you can use gsub:

string = 'X2015.XML.Outgoing.pounds..millions.'gsub("X(\\d{4})\\..*", "\\1", string)# [1] "2015"

or str_replace from stringr:

library(stringr)str_replace(string, "X(\\d{4})\\..*", "\\1")# [1] "2015"


I believe the most idiomatic way is to use str_match:

str_match(string = 'X2015.XML.Outgoing.pounds..millions.',          pattern = 'X(\\d{4})\\.')

Which returns the complete match followed by capture groups:

     [,1]     [,2]  [1,] "X2015." "2015"

As such the following will do the trick:

str_match(string = 'X2015.XML.Outgoing.pounds..millions.',          pattern = 'X(\\d{4})\\.')[2]