Select rows from a data frame based on values in a vector Select rows from a data frame based on values in a vector r r

Select rows from a data frame based on values in a vector


Have a look at ?"%in%".

dt[dt$fct %in% vc,]   fct X1    a 23    c 35    c 57    a 79    c 910   a 112   c 214   c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]


Similar to above, using filter from dplyr:

filter(df, fct %in% vc)


Another option would be to use a keyed data.table:

library(data.table)setDT(dt, key = 'fct')[J(vc)]  # or: setDT(dt, key = 'fct')[.(vc)]

which results in:

   fct X1:   a 22:   a 73:   a 14:   c 35:   c 56:   c 97:   c 28:   c 4

What this does:

  • setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
  • Next you can just subset with the vc vector with [J(vc)].

NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.

A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.

An alternative as suggested by @Frank in the comments:

setDT(dt)[J(vc), on=.(fct)]

When vc contains values that are not present in dt, you'll need to add nomatch = 0:

setDT(dt, key = 'fct')[J(vc), nomatch = 0]

or:

setDT(dt)[J(vc), on=.(fct), nomatch = 0]