Apply multiple functions to each row of a dataframe Apply multiple functions to each row of a dataframe r r

Apply multiple functions to each row of a dataframe


I think you are thinking too complex here. What is wrong with two separate apply() calls? There is however a far better way to do what you are doing here that involves no looping/apply calls. I'll deal with these separately, but the second solution is preferable as it is truly vectorised.

Two apply calls version

First two separate apply calls using all-Base R functions:

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))df1 <- transform(df1, MFE = apply(df1, 1, FindMFE), MAE = apply(df1, 1, FindMAE))df1

Which gives:

> df1  Bar1 Bar2 MFE MAE1    1    3   3   02    2    1   2   03    3    3   3   04   -3   -2   0  -35   -2   -3   0  -36   -1   -1   0  -1

Ok, looping over the rows of df1 twice is perhaps a little inefficient, but even for big problems you've spent more time already thinking about doing this cleverly in a single pass than you will save by doing that way.

Using vectorised functions pmax() and pmin()

So a better way of doing this is to note the pmax() and pmin() functions and realise that they can do what each the apply(df1, 1, FindFOO() calls were doing. For example:

> (tmp <- with(df1, pmax(0, Bar1, Bar2, na.rm = TRUE)))[1] 3 2 3 0 0 0

would be MFE from your Question. This is very simple to work with if you have two columns and they are Bar1 and Bar2 or the first 2 columns of df1, always. But it is not very general; what if you have multiple columns you want to compute this over etc? pmax(df1[, 1:2], na.rm = TRUE) won't do what we want:

> pmax(df1[, 1:2], na.rm = TRUE)  Bar1 Bar21    1    32    2    13    3    34   -3   -25   -2   -36   -1   -1

The trick to getting a general solution using pmax() and pmin() is to use do.call() to arrange the calls to those two functions for us. Updating your functions to use this idea we have:

FindMFE2 <- function(x) {   MFE <- do.call(pmax, c(as.list(x), 0, na.rm = TRUE))   MFE[is.infinite(MFE)] <- 0   MFE}FindMAE2 <- function(x) {   MAE <- do.call(pmin, c(as.list(x), 0, na.rm = TRUE))   MAE[is.infinite(MAE)] <- 0   MAE}

which give:

> transform(df1, MFE = FindMFE2(df1), MAE = FindMAE2(df1))  Bar1 Bar2 MFE MAE1    1    3   3   02    2    1   2   03    3    3   3   04   -3   -2   0  -35   -2   -3   0  -36   -1   -1   0  -1

and not an apply() in sight. If you want to do this in a single step, this is now much easier to wrap:

FindMAEandMFE2 <- function(x){    cbind(MFE = FindMFE2(x), MAE = FindMAE2(x))}

which can be used as:

> cbind(df1, FindMAEandMFE2(df1))  Bar1 Bar2 MFE MAE1    1    3   3   02    2    1   2   03    3    3   3   04   -3   -2   0  -35   -2   -3   0  -36   -1   -1   0  -1


I show three alternative one-liners:

  • Using the each function of plyr
  • Using the plyr each function with base R
  • Using the pmin and pmax functions that are vectorise

Solution 1: plyr and each

The plyr package defines the each function that does what you want. From ?each: Aggregate multiple functions into a single function. This means you can solve your problem using a one-liner:

library(plyr)adply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))  Bar1 Bar2 MAE MFE1    1    3   3   02    2    1   2   03    3    3   3   04   -3   -2   0  -35   -2   -3   0  -36   -1   -1   0  -1

Solution 2: each and base R

You can, of course, use each with base functions. Here is how you can use it with apply - just note that you have to transpose the results before adding to your original data.frame.

library(plyr)data.frame(df1,   t(apply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))))  Bar1 Bar2 MAE MFE1    1    3   3   02    2    1   2   03    3    3   3   04   -3   -2   0  -35   -2   -3   0  -36   -1   -1   0  -1

Solution 3: using vectorised functions

Using vectorised functions pmin and pmax, you can use this one-liner:

transform(df1, MFE=pmax(0, Bar1, Bar2), MAE=pmin(0, Bar1, Bar2))  Bar1 Bar2 MFE MAE1    1    3   3   02    2    1   2   03    3    3   3   04   -3   -2   0  -35   -2   -3   0  -36   -1   -1   0  -1


There are lots of good answers here. I started this while Gavin Simpson was editing so we cover some similar ground. What the parallel min and max do (pmin and pmax) is pretty much exactly what you're writing your functions for. It may be a little opaque what the 0 does in pmax(0, Bar1, Bar2) but essentially 0 gets recycled so that's it's like doing

pmax(c(0,0,0,0,0,0), Bar1, Bar2)

That will take each item of the three things passed and find the max of them. So, the max will be 0 if it was negative and accomplishes much of what your ifelse statement did. You could rewrite so you get vectors and combine things with functions similar to what you were doing and that might make it a bit more transparent. In this case we'd just pass the dataframe to a new parallel and fast findMFE function that will work with any numeric dataframe and get out a vector.

findMFE <- function(dataf){    MFE <- do.call( pmax, c(dataf, 0, na.rm = TRUE))}MFE <- findMFE(df1)

What this function does is add an extra column of 0s to the passed data frame and then call pmax passing each separate column of df1 as if it were a list (dataframes are lists so this is easy).

Now, I note that you actually want to correct for Inf values in your data that aren't in your example... we could add an extra line to your function...

findMFE <- function(dataf){    MFE <- do.call( pmax, c(dataf, 0, na.rm = TRUE))    ifelse(is.infinite(MFE), 0, MFE)}

Now, that's proper use of the ifelse() function on a vector. I did it that way as an example for you but Gavin Simpson's use of MFE[is.infinite(MFE)] <- 0 is more efficient. Note that this findMFE function isn't used in a loop, it's just passed the whole data frame.

The comparable findMAE is...

findMAE <- function(dataf){    MAE <- do.call( pmin, c(dataf, 0, na.rm = TRUE))    ifelse(is.infinite(MAE), 0, MAE)}

and the combined function is simply...

findMFEandMAE <- function(dataf){    MFE <- findMFE(dataf)    MAE <- findMAE(dataf)    return(data.frame(MFE, MAE))}

MFEandMAE <- findMFEandMAE(df1)df1 <- cbind(df1, MFEandMAE)

Some tips

If you've got a scalar if statement don't use ifelse(), use if() else. It's much faster in scalar situations. And, your functions are scalar and you're trying to vectorize them. ifelse() is already vectorized and runs very fast when used that way but much slower than if() else when used scalar.

Also, if you're going to be putting stuff in a loop or apply statement put as little in there as possible. For example, in your case the ifelse() really needed to be taken out of the loop and applied to the whole MFE result afterwards.