How do you delete a column by name in data.table? How do you delete a column by name in data.table? r r

How do you delete a column by name in data.table?


Any of the following will remove column foo from the data.table df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)df3[,foo:=NULL]df3[, c("foo","bar"):=NULL]  # remove two columnsmyVar = "foo"df3[, (myVar):=NULL]   # lookup myVar contents# Method 2a -- A safe idiom for excluding (possibly multiple)# columns matching a regexdf3[, grep("^foo$", colnames(df3)):=NULL]# Method 2b -- An alternative to 2a, also "safe" in the sense described belowdf3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

## Method 3 (could then assign to df3, df3[, !"foo"]  

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

# Method 4:df3[, .SD, .SDcols = !patterns("^foo$")]

Lastly there are approaches using with=FALSE, though data.table is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:

# Method 5a (like Method 3)df3[, !"foo", with=FALSE] # Method 5b (like Method 4)df3[, !grep("^foo$", names(df3)), with=FALSE]# Method 5b (another like Method 4)df3[, !grepl("^foo$", names(df3)), with=FALSE]


You can also use set for this, which avoids the overhead of [.data.table in loops:

dt <- data.table( a=letters, b=LETTERS, c=seq(26), d=letters, e=letters )set( dt, j=c(1L,3L,5L), value=NULL )> dt[1:5]   b d1: A a2: B b3: C c4: D d5: E e

If you want to do it by column name, which(colnames(dt) %in% c("a","c","e")) should work for j.


I simply do it in the data frame kind of way:

DT$col = NULL

Works fast and as far as I could see doesn't cause any problems.

UPDATE: not the best method if your DT is very large, as using the $<- operator will lead to object copying. So better use:

DT[, col:=NULL]