Pandas rolling regression: alternatives to looping Pandas rolling regression: alternatives to looping python python

Pandas rolling regression: alternatives to looping


I created an ols module designed to mimic pandas' deprecated MovingOLS; it is here.

It has three core classes:

  • OLS : static (single-window) ordinary least-squares regression. The output are NumPy arrays
  • RollingOLS : rolling (multi-window) ordinary least-squares regression. The output are higher-dimension NumPy arrays.
  • PandasRollingOLS : wraps the results of RollingOLS in pandas Series & DataFrames. Designed to mimic the look of the deprecated pandas module.

Note that the module is part of a package (which I'm currently in the process of uploading to PyPi) and it requires one inter-package import.

The first two classes above are implemented entirely in NumPy and primarily use matrix algebra. RollingOLS takes advantage of broadcasting extensively also. Attributes largely mimic statsmodels' OLS RegressionResultsWrapper.

An example:

import urllib.parseimport pandas as pdfrom pyfinance.ols import PandasRollingOLS# You can also do this with pandas-datareader; here's the hard wayurl = "https://fred.stlouisfed.org/graph/fredgraph.csv"syms = {    "TWEXBMTH" : "usd",     "T10Y2YM" : "term_spread",     "GOLDAMGBD228NLBM" : "gold",}params = {    "fq": "Monthly,Monthly,Monthly",    "id": ",".join(syms.keys()),    "cosd": "2000-01-01",    "coed": "2019-02-01",}data = pd.read_csv(    url + "?" + urllib.parse.urlencode(params, safe=","),    na_values={"."},    parse_dates=["DATE"],    index_col=0).pct_change().dropna().rename(columns=syms)print(data.head())#                  usd  term_spread      gold# DATE                                       # 2000-02-01  0.012580    -1.409091  0.057152# 2000-03-01 -0.000113     2.000000 -0.047034# 2000-04-01  0.005634     0.518519 -0.023520# 2000-05-01  0.022017    -0.097561 -0.016675# 2000-06-01 -0.010116     0.027027  0.036599y = data.usdx = data.drop('usd', axis=1)window = 12  # monthsmodel = PandasRollingOLS(y=y, x=x, window=window)print(model.beta.head())  # Coefficients excluding the intercept#             term_spread      gold# DATE                             # 2001-01-01     0.000033 -0.054261# 2001-02-01     0.000277 -0.188556# 2001-03-01     0.002432 -0.294865# 2001-04-01     0.002796 -0.334880# 2001-05-01     0.002448 -0.241902print(model.fstat.head())# DATE# 2001-01-01    0.136991# 2001-02-01    1.233794# 2001-03-01    3.053000# 2001-04-01    3.997486# 2001-05-01    3.855118# Name: fstat, dtype: float64print(model.rsq.head())  # R-squared# DATE# 2001-01-01    0.029543# 2001-02-01    0.215179# 2001-03-01    0.404210# 2001-04-01    0.470432# 2001-05-01    0.461408# Name: rsq, dtype: float64


Use a custom rolling apply function.

import numpy as npdf['slope'] = df.values.rolling(window=125).apply(lambda x: np.polyfit(np.array(range(0,125)), x, 1)[0], raw=True)