Pandas rolling regression: alternatives to looping
I created an ols
module designed to mimic pandas' deprecated MovingOLS
; it is here.
It has three core classes:
OLS
: static (single-window) ordinary least-squares regression. The output are NumPy arraysRollingOLS
: rolling (multi-window) ordinary least-squares regression. The output are higher-dimension NumPy arrays.PandasRollingOLS
: wraps the results ofRollingOLS
in pandas Series & DataFrames. Designed to mimic the look of the deprecated pandas module.
Note that the module is part of a package (which I'm currently in the process of uploading to PyPi) and it requires one inter-package import.
The first two classes above are implemented entirely in NumPy and primarily use matrix algebra. RollingOLS
takes advantage of broadcasting extensively also. Attributes largely mimic statsmodels' OLS RegressionResultsWrapper
.
An example:
import urllib.parseimport pandas as pdfrom pyfinance.ols import PandasRollingOLS# You can also do this with pandas-datareader; here's the hard wayurl = "https://fred.stlouisfed.org/graph/fredgraph.csv"syms = { "TWEXBMTH" : "usd", "T10Y2YM" : "term_spread", "GOLDAMGBD228NLBM" : "gold",}params = { "fq": "Monthly,Monthly,Monthly", "id": ",".join(syms.keys()), "cosd": "2000-01-01", "coed": "2019-02-01",}data = pd.read_csv( url + "?" + urllib.parse.urlencode(params, safe=","), na_values={"."}, parse_dates=["DATE"], index_col=0).pct_change().dropna().rename(columns=syms)print(data.head())# usd term_spread gold# DATE # 2000-02-01 0.012580 -1.409091 0.057152# 2000-03-01 -0.000113 2.000000 -0.047034# 2000-04-01 0.005634 0.518519 -0.023520# 2000-05-01 0.022017 -0.097561 -0.016675# 2000-06-01 -0.010116 0.027027 0.036599y = data.usdx = data.drop('usd', axis=1)window = 12 # monthsmodel = PandasRollingOLS(y=y, x=x, window=window)print(model.beta.head()) # Coefficients excluding the intercept# term_spread gold# DATE # 2001-01-01 0.000033 -0.054261# 2001-02-01 0.000277 -0.188556# 2001-03-01 0.002432 -0.294865# 2001-04-01 0.002796 -0.334880# 2001-05-01 0.002448 -0.241902print(model.fstat.head())# DATE# 2001-01-01 0.136991# 2001-02-01 1.233794# 2001-03-01 3.053000# 2001-04-01 3.997486# 2001-05-01 3.855118# Name: fstat, dtype: float64print(model.rsq.head()) # R-squared# DATE# 2001-01-01 0.029543# 2001-02-01 0.215179# 2001-03-01 0.404210# 2001-04-01 0.470432# 2001-05-01 0.461408# Name: rsq, dtype: float64
Use a custom rolling apply function.
import numpy as npdf['slope'] = df.values.rolling(window=125).apply(lambda x: np.polyfit(np.array(range(0,125)), x, 1)[0], raw=True)