Extract a regular expression match Extract a regular expression match r r

Extract a regular expression match


Use the new stringr package which wraps all the existing regular expression operates in a consistent syntax and adds a few that are missing:

library(stringr)str_locate("aaa12xxx", "[0-9]+")#      start end# [1,]     4   5str_extract("aaa12xxx", "[0-9]+")# [1] "12"


It is probably a bit hasty to say 'ignore the standard functions' - the help file for ?gsub even specifically references in 'See also':

‘regmatches’ for extracting matched substrings based on the results of ‘regexpr’, ‘gregexpr’ and ‘regexec’.

So this will work, and is fairly simple:

txt <- "aaa12xxx"regmatches(txt,regexpr("[0-9]+",txt))#[1] "12"


For your specific case you could remove all not numbers:

gsub("[^0-9]", "", "aaa12xxxx")# [1] "12"

It won't work in more complex cases

gsub("[^0-9]", "", "aaa12xxxx34")# [1] "1234"