Extracting specific columns from a data frame Extracting specific columns from a data frame r r

Extracting specific columns from a data frame


You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()), especially when programming in functions, packages, or applications.

# data for reproducible example# (and to avoid confusion from trying to subset `stats::df`)df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5])# subsetdf[c("A","B","E")]

Note there's no comma (i.e. it's not df[,c("A","B","C")]). That's because df[,"A"] returns a vector, not a data frame. But df["A"] will always return a data frame.

str(df["A"])## 'data.frame':    1 obs. of  1 variable:## $ A: int 1str(df[,"A"])  # vector##  int 1

Thanks to David Dorchies for pointing out that df[,"A"] returns a vector instead of a data.frame, and to Antoine Fabri for suggesting a better alternative (above) to my original solution (below).

# subset (original solution--not recommended)df[,c("A","B","E")]  # returns a data.framedf[,"A"]             # returns a vector


Using the dplyr package, if your data.frame is called df1:

library(dplyr)df1 %>%  select(A, B, E)

This can also be written without the %>% pipe as:

select(df1, A, B, E)


This is the role of the subset() function:

> dat <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9)) > subset(dat, select=c("A", "B"))  A B1 1 32 2 4