R data.table change R names R data.table change R names r r

R data.table change R names


Don't provide old and new and you won't have a problem. However, that's not the issue. In base::data.frame you can't have columns of the same name so...

#  What you actually get...DT = data.frame(a=1:2, a=1:2); names(DT)#[1] "a"   "a.1"

But it seems that in data.table you can have columns of the same name...

DT = data.table(a=1:2, a=1:2); names(DT)[1] "a" "a"

But setnames throws an error, I guess because it doesn't know which column a refers to when both columns are called a. You get no error when going the data.frame to data.table route because you do not have duplicated column names.

Firstly I'd say don't make columns with the same name, this is a really bad thing if you plan to use your data.table programmatically (but as @MatthewDowle points out in the comments, this is a design choice to give the user maximum freedom in data.table).

If you need to do it then use setnames with just the old argument given, which will actually be treated as the new names when new is not given. If you pass in old names and a vector of new names the old names are found and those changed to the corresponding new name (so old and new have to be the same length when setnames is used with 3 parameters). setnames will catch any ambiguities via:

if (any(duplicated(old)))            stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],                collapse = ","))if (any(duplicated(names(x))))            stop("'old' is character but there are duplicate column names: ",                 paste(names(x)[duplicated(names(x))], collapse = ",")) 

When just old is supplied setnames will reassign the names from old to the columns of DT column-wise using .Call(Csetcharvec, names(x), seq_along(names(x)), old), so from first to last...

DT = data.table(a=1:2, a=1:2)setnames(DT, c("b","b") )DT#   b b#1: 1 1#2: 2 2

Addition from Matthew as requested. In ?setnames there's some background :

It isn't good programming practice, in general, to use column numbers rather than names. This is why setkey and setkeyv only accept column names, and why old in setnames() is recommended to be names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL, where "select * from ..." is considered poor programming style when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]).

[As of July 2017, the note above no longer appears in ?setnames, but the issue is discussed near the top of the FAQ, vignette('datatable-faq').]

So the idea of setnames is to change one column name really easily, by name.

setnames(DT, "oldname", "newname")

If "oldname" is not a column name or there's any ambiguity over what you intend (either in the data now or in a few months time after your colleagues have changed the source database or other code upstream or have passed their own data to your module) then data.table will catch it for you. That's actually quite hard to do in base as easily and as well as setnames does it (including the safety checks).


setnames can be used for changing multiple column names at once:

setnames(DT, old = c("oldname1", "oldname2", "oldname3"), new = c("newname1", "newname2", "newname3"))