Why are loops slow in R? Why are loops slow in R? r r

Why are loops slow in R?


It's not always the case that loops are slow and apply is fast. There's a nice discussion of this in the May, 2008, issue of R News:

Uwe Ligges and John Fox. R Help Desk: How can I avoid this loop or make it faster? R News, 8(1):46-50, May 2008.

In the section "Loops!" (starting on pg 48), they say:

Many comments about R state that using loops is a particularly bad idea. This is not necessarily true. In certain cases, it is difficult to write vectorized code, or vectorized code may consume a huge amount of memory.

They further suggest:

  • Initialize new objects to full length before the loop, rather than increasing their size within the loop.
  • Do not do things in a loop that can be done outside the loop.
  • Do not avoid loops simply for the sake of avoiding loops.

They have a simple example where a for loop takes 1.3 sec but apply runs out of memory.


Loops in R are slow for the same reason any interpreted language is slow: everyoperation carries around a lot of extra baggage.

Look at R_execClosure in eval.c (this is the function called to call auser-defined function). It's nearly 100 lines long and performs all sorts ofoperations -- creating an environment for execution, assigning arguments intothe environment, etc.

Think how much less happens when you call a function in C (push args on tostack, jump, pop args).

So that is why you get timings like these (as joran pointed out in the comment,it's not actually apply that's being fast; it's the internal C loop in meanthat's being fast. apply is just regular old R code):

A = matrix(as.numeric(1:100000))

Using a loop: 0.342 seconds:

system.time({    Sum = 0    for (i in seq_along(A)) {        Sum = Sum + A[[i]]    }    Sum})

Using sum: unmeasurably small:

sum(A)

It's a little disconcerting because, asymptotically, the loop is just as goodas sum; there's no practical reason it should be slow; it's just doing moreextra work each iteration.

So consider:

# 0.370 secondssystem.time({    I = 0    while (I < 100000) {        10        I = I + 1    }})# 0.743 seconds -- double the time just adding parenthesessystem.time({    I = 0    while (I < 100000) {        ((((((((((10))))))))))        I = I + 1    }})

(That example was discovered by Radford Neal)

Because ( in R is an operator, and actually requires a name lookup every time you use it:

> `(` = function(x) 2> (3)[1] 2

Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that ( trick in C.


The only Answer to the Question posed is; loops are not slow if what you need to do is iterate over a set of data performing some function and that function or the operation is not vectorized. A for() loop will be as quick, in general, as apply(), but possibly a little bit slower than an lapply() call. The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop.

Why many people think for() loops are slow is because they, the user, are writing bad code. In general (though there are several exceptions), if you need to expand/grow an object, that too will involve copying so you have both the overhead of copying and growing the object. This is not just restricted to loops, but if you copy/grow at each iteration of a loop, of course, the loop is going to be slow because you are incurring many copy/grow operations.

The general idiom for using for() loops in R is that you allocate the storage you require before the loop starts, and then fill in the object thus allocated. If you follow that idiom, loops will not be slow. This is what apply() manages for you, but it is just hidden from view.

Of course, if a vectorised function exists for the operation you are implementing with the for() loop, don't do that. Likewise, don't use apply() etc if a vectorised function exists (e.g. apply(foo, 2, mean) is better performed via colMeans(foo)).