How to calculate rolling / moving average using python + NumPy / SciPy? How to calculate rolling / moving average using python + NumPy / SciPy? numpy numpy

How to calculate rolling / moving average using python + NumPy / SciPy?


If you just want a straightforward non-weighted moving average, you can easily implement it with np.cumsum, which may be is faster than FFT based methods:

EDIT Corrected an off-by-one wrong indexing spotted by Bean in the code. EDIT

def moving_average(a, n=3) :    ret = np.cumsum(a, dtype=float)    ret[n:] = ret[n:] - ret[:-n]    return ret[n - 1:] / n>>> a = np.arange(20)>>> moving_average(a)array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,        12.,  13.,  14.,  15.,  16.,  17.,  18.])>>> moving_average(a, n=4)array([  1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,   9.5,        10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5])

So I guess the answer is: it is really easy to implement, and maybe numpy is already a little bloated with specialized functionality.


A simple way to achieve this is by using np.convolve.The idea behind this is to leverage the way the discrete convolution is computed and use it to return a rolling mean. This can be done by convolving with a sequence of np.ones of a length equal to the sliding window length we want.

In order to do so we could define the following function:

def moving_average(x, w):    return np.convolve(x, np.ones(w), 'valid') / w

This function will be taking the convolution of the sequence x and a sequence of ones of length w. Note that the chosen mode is valid so that the convolution product is only given for points where the sequences overlap completely.


Some examples:

x = np.array([5,3,8,10,2,1,5,1,0,2])

For a moving average with a window of length 2 we would have:

moving_average(x, 2)# array([4. , 5.5, 9. , 6. , 1.5, 3. , 3. , 0.5, 1. ])

And for a window of length 4:

moving_average(x, 4)# array([6.5 , 5.75, 5.25, 4.5 , 2.25, 1.75, 2.  ])

How does convolve work?

Lets have a more in depth look at the way the discrete convolution is being computed.The following function aims to replicate the way np.convolve is computing the output values:

def mov_avg(x, w):    for m in range(len(x)-(w-1)):        yield sum(np.ones(w) * x[m:m+w]) / w 

Which, for the same example above would also yield:

list(mov_avg(x, 2))# [4.0, 5.5, 9.0, 6.0, 1.5, 3.0, 3.0, 0.5, 1.0]

So what is being done at each step is to take the inner product between the array of ones and the current window. In this case the multiplication by np.ones(w) is superfluous given that we are directly taking the sum of the sequence.

Bellow is an example of how the first outputs are computed so that it is a little clearer. Lets suppose we want a window of w=4:

[1,1,1,1][5,3,8,10,2,1,5,1,0,2]= (1*5 + 1*3 + 1*8 + 1*10) / w = 6.5

And the following output would be computed as:

  [1,1,1,1][5,3,8,10,2,1,5,1,0,2]= (1*3 + 1*8 + 1*10 + 1*2) / w = 5.75

And so on, returning a moving average of the sequence once all overlaps have been performed.


NumPy's lack of a particular domain-specific function is perhaps due to the Core Team's discipline and fidelity to NumPy's prime directive: provide an N-dimensional array type, as well as functions for creating, and indexing those arrays. Like many foundational objectives, this one is not small, and NumPy does it brilliantly.

The (much) larger SciPy contains a much larger collection of domain-specific libraries (called subpackages by SciPy devs)--for instance, numerical optimization (optimize), signal processsing (signal), and integral calculus (integrate).

My guess is that the function you are after is in at least one of the SciPy subpackages (scipy.signal perhaps); however, i would look first in the collection of SciPy scikits, identify the relevant scikit(s) and look for the function of interest there.

Scikits are independently developed packages based on NumPy/SciPy and directed to a particular technical discipline (e.g., scikits-image, scikits-learn, etc.) Several of these were (in particular, the awesome OpenOpt for numerical optimization) were highly regarded, mature projects long before choosing to reside under the relatively new scikits rubric. The Scikits homepage liked to above lists about 30 such scikits, though at least several of those are no longer under active development.

Following this advice would lead you to scikits-timeseries; however, that package is no longer under active development; In effect, Pandas has become, AFAIK, the de facto NumPy-based time series library.

Pandas has several functions that can be used to calculate a moving average; the simplest of these is probably rolling_mean, which you use like so:

>>> # the recommended syntax to import pandas>>> import pandas as PD>>> import numpy as NP>>> # prepare some fake data:>>> # the date-time indices:>>> t = PD.date_range('1/1/2010', '12/31/2012', freq='D')>>> # the data:>>> x = NP.arange(0, t.shape[0])>>> # combine the data & index into a Pandas 'Series' object>>> D = PD.Series(x, t)

Now, just call the function rolling_mean passing in the Series object and a window size, which in my example below is 10 days.

>>> d_mva = PD.rolling_mean(D, 10)>>> # d_mva is the same size as the original Series>>> d_mva.shape    (1096,)>>> # though obviously the first w values are NaN where w is the window size>>> d_mva[:3]    2010-01-01         NaN    2010-01-02         NaN    2010-01-03         NaN

verify that it worked--e.g., compared values 10 - 15 in the original series versus the new Series smoothed with rolling mean

>>> D[10:15]     2010-01-11    2.041076     2010-01-12    2.041076     2010-01-13    2.720585     2010-01-14    2.720585     2010-01-15    3.656987     Freq: D>>> d_mva[10:20]      2010-01-11    3.131125      2010-01-12    3.035232      2010-01-13    2.923144      2010-01-14    2.811055      2010-01-15    2.785824      Freq: D

The function rolling_mean, along with about a dozen or so other function are informally grouped in the Pandas documentation under the rubric moving window functions; a second, related group of functions in Pandas is referred to as exponentially-weighted functions (e.g., ewma, which calculates exponentially moving weighted average). The fact that this second group is not included in the first (moving window functions) is perhaps because the exponentially-weighted transforms don't rely on a fixed-length window