In R, what exactly is the problem with having variables with the same name as base R functions?
There isn't really one. R will not normally search objects (non function objects) when looking for a function:
> mean(1:10)[1] 5.5> mean <- 1> mean(1:10)[1] 5.5> rm(mean)> mean(1:10)[1] 5.5
The examples shown by @Joris and @Sacha are where poor coding catches you out. One better way to write foo
is:
foo <- function(x, fun) { fun <- match.fun(fun) fun(x)}
Which when used gives:
> foo(1:10, mean)[1] 5.5> mean <- 1> foo(1:10, mean)[1] 5.5
There are situations where this will catch you out, and @Joris's example with na.omit
is one, which IIRC, is happening because of the standard, non-standard evaluation used in lm()
.
Several Answers have also conflated the T
vs TRUE
issue with the masking of functions issue. As T
and TRUE
are not functions that is a little outside the scope of @Andrie's Question.
The problem is not so much the computer, but the user. In general, code can become a lot harder to debug. Typos are made very easily, so if you do :
c <- c("Some text", "Second", "Third")c[3]c(3)
You get the correct results. But if you miss somewhere in a code and type c(3)
instead of c[3]
, finding the error will not be that easy.
The scoping can also lead to very confusing error reports. Take following flawed function :
my.foo <- function(x){ if(x) c <- 1 c + 1}> my.foo(TRUE)[1] 2> my.foo(FALSE)Error in c + 1 : non-numeric argument to binary operator
With more complex functions, this can lead you on a debugging trail leading nowhere. If you replace c
with x
in the above function, the error will read "object 'x' not found
". That will lead a lot faster to your coding error.
Next to that, it can lead to rather confusing code. Code like c(c+c(a,b,c))
asks more from the brain than c(d+c(a,b,d))
. Again, this is a trivial example, but it can make a difference.
And obviously, you can get errors too. When you expect a function, you won't get it, which can give rise to another set of annoying bugs :
my.foo <- function(x,fun) fun(x)my.foo(1,sum)[1] 1my.foo(1,c)Error in my.foo(1, c) : could not find function "fun"
A more realistic (and real-life) example of how this can cause trouble :
x <- c(1:10,NA)y <- c(NA,1:10)lm(x~y,na.action=na.omit)# ... correct output ...na.omit <- TRUElm(x~y,na.action=na.omit)Error in model.frame.default(formula = x ~ y, na.action = na.omit, drop.unused.levels = TRUE) : attempt to apply non-function
Try figuring out what's wrong here if na.omit <- TRUE
occurs 50 lines up in your code...
Answer edited after comment of @Andrie to include the example of confusing error reports
R is very robust to this, but you can think of ways to break it. For example, consider this funcion:
foo <- function(x,fun) fun(x)
Which simply applies fun
to x
. Not the prettiest way to do this but you might encounter this from someones script or so. This works for mean()
:
> foo(1:10,mean)[1] 5.5
But if I assign a new value to mean it breaks:
mean <- 1foo(1:10,mean)Error in foo(1:10, mean) : could not find function "fun"
This will happen very rarely, but it might happen. It is also very confusing for people if the same thing means two things:
mean(mean)
Since it is trivial to use any other name you want, why not use a different name then base R functions? Also, for some R variables this becomes even more important. Think of reassigning the '+'
function! Another good example is reassignment of T
and F
which can break so much scripts.