In R, what exactly is the problem with having variables with the same name as base R functions? In R, what exactly is the problem with having variables with the same name as base R functions? r r

In R, what exactly is the problem with having variables with the same name as base R functions?


There isn't really one. R will not normally search objects (non function objects) when looking for a function:

> mean(1:10)[1] 5.5> mean <- 1> mean(1:10)[1] 5.5> rm(mean)> mean(1:10)[1] 5.5

The examples shown by @Joris and @Sacha are where poor coding catches you out. One better way to write foo is:

foo <- function(x, fun) {    fun <- match.fun(fun)    fun(x)}

Which when used gives:

> foo(1:10, mean)[1] 5.5> mean <- 1> foo(1:10, mean)[1] 5.5

There are situations where this will catch you out, and @Joris's example with na.omit is one, which IIRC, is happening because of the standard, non-standard evaluation used in lm().

Several Answers have also conflated the T vs TRUE issue with the masking of functions issue. As T and TRUE are not functions that is a little outside the scope of @Andrie's Question.


The problem is not so much the computer, but the user. In general, code can become a lot harder to debug. Typos are made very easily, so if you do :

c <- c("Some text", "Second", "Third")c[3]c(3)

You get the correct results. But if you miss somewhere in a code and type c(3) instead of c[3], finding the error will not be that easy.

The scoping can also lead to very confusing error reports. Take following flawed function :

my.foo <- function(x){    if(x) c <- 1    c + 1}> my.foo(TRUE)[1] 2> my.foo(FALSE)Error in c + 1 : non-numeric argument to binary operator

With more complex functions, this can lead you on a debugging trail leading nowhere. If you replace c with x in the above function, the error will read "object 'x' not found". That will lead a lot faster to your coding error.

Next to that, it can lead to rather confusing code. Code like c(c+c(a,b,c)) asks more from the brain than c(d+c(a,b,d)). Again, this is a trivial example, but it can make a difference.

And obviously, you can get errors too. When you expect a function, you won't get it, which can give rise to another set of annoying bugs :

my.foo <- function(x,fun) fun(x)my.foo(1,sum)[1] 1my.foo(1,c)Error in my.foo(1, c) : could not find function "fun"

A more realistic (and real-life) example of how this can cause trouble :

x <- c(1:10,NA)y <- c(NA,1:10)lm(x~y,na.action=na.omit)# ... correct output ...na.omit <- TRUElm(x~y,na.action=na.omit)Error in model.frame.default(formula = x ~ y, na.action = na.omit, drop.unused.levels = TRUE) : attempt to apply non-function

Try figuring out what's wrong here if na.omit <- TRUE occurs 50 lines up in your code...

Answer edited after comment of @Andrie to include the example of confusing error reports


R is very robust to this, but you can think of ways to break it. For example, consider this funcion:

foo <- function(x,fun) fun(x)

Which simply applies fun to x. Not the prettiest way to do this but you might encounter this from someones script or so. This works for mean():

> foo(1:10,mean)[1] 5.5

But if I assign a new value to mean it breaks:

mean <- 1foo(1:10,mean)Error in foo(1:10, mean) : could not find function "fun"

This will happen very rarely, but it might happen. It is also very confusing for people if the same thing means two things:

mean(mean)

Since it is trivial to use any other name you want, why not use a different name then base R functions? Also, for some R variables this becomes even more important. Think of reassigning the '+' function! Another good example is reassignment of T and F which can break so much scripts.