Numpy mean AND variance from single function? Numpy mean AND variance from single function? numpy numpy

Numpy mean AND variance from single function?


You can't pass a known mean to np.std or np.var, you'll have to wait for the new standard library statistics module, but in the meantime you can save a little time by using the formula:

In [329]: a = np.random.rand(1000)In [330]: %%timeit   .....: a.mean()   .....: a.var()   .....: 10000 loops, best of 3: 80.6 µs per loopIn [331]: %%timeit   .....: m = a.mean()   .....: np.mean((a-m)**2)   .....: 10000 loops, best of 3: 60.9 µs per loopIn [332]: m = a.mean()In [333]: a.var()Out[333]: 0.078365856465916137In [334]: np.mean((a-m)**2)Out[334]: 0.078365856465916137

If you really are trying to speed things up, try np.dot to do the squaring and summing (since that's what a dot-product is):

In [335]: np.dot(a-m,a-m)/a.sizeOut[335]: 0.078365856465916137In [336]: %%timeit   .....: m = a.mean()   .....: c = a-m   .....: np.dot(c,c)/a.size   .....: 10000 loops, best of 3: 38.2 µs per loop


You can also avoid the subtraction by making use of the relation between mean, variance and power of a signal:

In [7]: import numpy as npIn [8]: a = np.random.rand(1000)In [9]: %%timeit   ...: a.mean()   ...: a.var()   ...: 10000 loops, best of 3: 24.7 us per loopIn [10]: %%timeit    ...: m = a.mean()    ...: np.mean((a-m)**2)    ...: 100000 loops, best of 3: 18.5 us per loopIn [11]: %%timeit    ...: m = a.mean()    ...: power = np.mean(a ** 2)    ...: power - m ** 2    ...: 100000 loops, best of 3: 17.3 us per loopIn [12]: %%timeit    ...: m = a.mean()    ...: power = np.dot(a, a) / a.size    ...: power - m ** 2    ...: 100000 loops, best of 3: 9.16 us per loop


I don't think NumPy provides a function that returns both the mean and the variance.

However, SciPy provides the function scipy.stats.norm.fit() which returns the mean and standard deviation of a sample. The function is named after its more specific purpose of fitting a normal distribution to a sample.

Example:

>>> import scipy.stats>>> scipy.stats.norm.fit([1,2,3])(2.0, 0.81649658092772603)

Note that fit() does not apply Bessel's correction to the standard deviation, so if you want that correction, you have to multiply by the appropriate factor.