How to apply a function not returning a numeric value to a pandas rolling Window?
Note that the apply
function for a Rolling
object is different from the apply
function for a Series
object and I agree with you that this is a bit confusing. In my understanding, the functions applied to rolling windows are typically meant for aggregation of data (such as sum
, count
etc.).
However, you can convert your rolling windows to a list and apply the function to that list (thanks to this discussion).
So my approach would be:
import numpy as npimport pandas as pdnp.random.seed(1)number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])number_series = number_series.apply(lambda x: float(x))def func(s): if len(s) > 2: if s[-1] > s[-2] > s[-3]: return 'High' elif s[-1] > s[-2]: return 'Medium' else: return 'Low' else: return ''list = [func(window) for window in list(number_series.rolling(5))]new_series = pd.Series(list, index=number_series.index)
Also note that func
needs to handle the first items differently because indices would otherwise be out of bounds.
One approach is to:
- Get the
WindowIndexer
or therolling()
method. - Apply
func
returning a string and storing the results as a list - Convert back your results to a series.
import numpy as npimport pandas as pdnp.random.seed(1)number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])number_series = number_series.apply(lambda x: float(x))def func(s): if (len(s) >= 3) and (s[-1] > s[-2] > s[-3]): return 'High' elif (len(s) >= 2) and s[-1] > s[-2]: return 'Medium' else: return 'Low' # Step 1: Get the window indexer window_indexer = number_series.rolling(5)._get_window_indexer()start, end = window_indexer.get_window_bounds(num_values=len(number_series))# Step 2: Apply funcresults = [func(number_series.iloc[slice(s, e)]) for s, e in zip(start, end)] # Step 3: Get results back to a pandas Seriesnew_series = pd.Series(results, index=number_series.index)new_series>>>2000-01-02 Low2000-01-09 Low2000-01-16 Medium2000-01-23 Low2000-01-30 Medium ... 2001-10-28 Low2001-11-04 Medium2001-11-11 High2001-11-18 High2001-11-25 LowLength: 100, dtype: object
Here's another way using boolean 'or' trick with a list and pd.Series constructor:
import numpy as npimport pandas as pdnp.random.seed(1)number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])number_series = number_series.apply(lambda x: float(x))def func(s): if s[-1] > s[-2] > s[-3]: return 'High' elif s[-1] > s[-2]: return 'Medium' else: return 'Low'l = []new_series = number_series.rolling(5).apply(lambda x: l.append(func(x)) or 0)pd.Series(l, index=number_series.index[:len(l)])