How to get mean, median, and other statistics over entire matrix, array or dataframe?
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean
and median
.
For a matrix, or array, as the others have stated,
mean
andmedian
will return a single value. However,var
will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array,var
goes back to returning a single value.sd
on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better,mad
returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce usingas.vector()
first. Having fun yet?For a
data.frame
,mean
is deprecated, but will again act on the columns separately.median
requires that you coerce to a vector first, orunlist
. As before,var
will return the covariances, andsd
is again deprecated but will return the standard deviation of the columns.mad
requires that you coerce to a vector orunlist
. In general for adata.frame
if you want something to act on all values, you generally will justunlist
it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices aredefunct.
By default, mean
and median
etc work over an entire array or matrix.
E.g.:
# array:m <- array(runif(100),dim=c(10,10))mean(m) # returns *one* value.# matrix:mean(as.matrix(m)) # same as before
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
# data framemdf <- as.data.frame(m)# mean(mdf) returns column meansmean( as.matrix(mdf) ) # one value.
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.