How to predict time series in scikit-learn?
According to Wikipedia, EWMA works well with stationary data, but it does not work as expected in the presence of trends, or seasonality. In those cases you should use a second or third order EWMA method, respectively. I decided to look at the pandas ewma
function to see how it handled trends, and this is what I came up with:
import pandas, numpy as npewma = pandas.stats.moments.ewma# make a hat function, and add noisex = np.linspace(0,1,100)x = np.hstack((x,x[::-1]))x += np.random.normal( loc=0, scale=0.1, size=200 )plot( x, alpha=0.4, label='Raw' )# take EWMA in both directions with a smaller span termfwd = ewma( x, span=15 ) # take EWMA in fwd directionbwd = ewma( x[::-1], span=15 ) # take EWMA in bwd directionc = np.vstack(( fwd, bwd[::-1] )) # lump fwd and bwd togetherc = np.mean( c, axis=0 ) # average # regular EWMA, with bias against trendplot( ewma( x, span=20 ), 'b', label='EWMA, span=20' )# "corrected" (?) EWMAplot( c, 'r', label='Reversed-Recombined' )legend(loc=8)savefig( 'ewma_correction.png', fmt='png', dpi=100 )
As you can see, the EWMA bucks the trend uphill and downhill. We can correct for this (without having to implement a second-order scheme ourselves) by taking the EWMA in both directions and then averaging. I hope your data was stationary!
This might be what you're looking for, with regard to the exponentially weighted moving average:
import pandas, numpyewma = pandas.stats.moments.ewmaEMOV_n = ewma( ys, com=2 )
Here, com
is a parameter that you can read about here. Then you can combine EMOV_n
to Xs
, using something like:
Xs = numpy.vstack((Xs,EMOV_n))
And then you can look at various linear models, here, and do something like:
from sklearn import linear_modelclf = linear_model.LinearRegression()clf.fit ( Xs, ys )print clf.coef_
Best of luck!