Pattern matching using a wildcard Pattern matching using a wildcard r r

Pattern matching using a wildcard


If you want to examine elements inside a dataframe you should not be using ls() which only looks at the names of objects in the current workspace (or if used inside a function in the current environment). Rownames or elements inside such objects are not visible to ls() (unless of course you add an environment argument to the ls(.)-call). Try using grep() which is the workhorse function for pattern matching of character vectors:

result <- a[ grep("blue", a$x) , ]  # Note need to use `a$` to get at the `x`

If you want to use subset then consider the closely related function grepl() which returns a vector of logicals can be used in the subset argument:

subset(a, grepl("blue", a$x))      x2 blue13 blue2

Edit: Adding one "proper" use of glob2rx within subset():

result <- subset(a,  grepl(glob2rx("blue*") , x) )result      x2 blue13 blue2

I don't think I actually understood glob2rx until I came back to this question. (I did understand the scoping issues that were ar the root of the questioner's difficulties. Anybody reading this should now scroll down to Gavin's answer and upvote it.)


glob2rx() converts a pattern including a wildcard into the equivalent regular expression. You then need to pass this regular expression onto one of R's pattern matching tools.

If you want to match "blue*" where * has the usual wildcard, not regular expression, meaning we use glob2rx() to convert the wildcard pattern into a useful regular expression:

> glob2rx("blue*")[1] "^blue"

The returned object is a regular expression.

Given your data:

x <- c('red','blue1','blue2', 'red2')

we can pattern match using grep() or similar tools:

> grx <- glob2rx("blue*")> grep(grx, x)[1] 2 3> grep(grx, x, value = TRUE)[1] "blue1" "blue2"> grepl(grx, x)[1] FALSE  TRUE  TRUE FALSE

As for the selecting rows problem you posted

> a <- data.frame(x =  c('red','blue1','blue2', 'red2'))> with(a, a[grepl(grx, x), ])[1] blue1 blue2Levels: blue1 blue2 red red2> with(a, a[grep(grx, x), ])[1] blue1 blue2Levels: blue1 blue2 red red2

or via subset():

> with(a, subset(a, subset = grepl(grx, x)))      x2 blue13 blue2

Hope that explains what grob2rx() does and how to use it?


You're on the right track - the keyword you should be googling is Regular Expressions. R does support them in a more direct way than this using grep() and a few other alternatives.

Here's a detailed discussion: http://www.regular-expressions.info/rlanguage.html