How to delete a segment of a string with a specific start and end in R using regular expressions? How to delete a segment of a string with a specific start and end in R using regular expressions? r r

How to delete a segment of a string with a specific start and end in R using regular expressions?


This will do it:

gsub(" : .*?L", "", str)#[1] "F14"           "W15, W15"      "W15, F14, F14"


You can do this with ease using the qdapRegex package that I maintain:

str = c("F14 : M114L","W15 : M116L, W15 : M118L","W15 : D111L, F14 : E112L, F14 : M116L")library(qdapRegex)rm_between(str, "\\s:", "L")## [1] "F14"           "W15, W15"      "W15, F14, F14"

qdapRegex aims to be useful as it teaches. If you are interested in the regex used...

S("@rm_between", "\\s:", "L")## [1] "(\\s:)(.*?)(L)"gsub(S("@rm_between", "\\s:", "L") , "", str)


Couple of approaches.

Take the first few letters if it's always three:

substr(str,1,3)

I personally like stringr too. It makes extraction really straightforward. Pattern what you want, not what you don't want.

library(stringr)str_extract(str,"[A-Z][0-9]*")

I've simplified these for a vector, but you have sub elements, you'll need something like:

splits <- strsplit(str,", ")result <- lapply(splits, substr, start = 1, stop = 3 )

or

result <- lapply(splits, str_extract, pattern = "[A-Z][0-9]*")