How to apply a function not returning a numeric value to a pandas rolling Window? How to apply a function not returning a numeric value to a pandas rolling Window? pandas pandas

How to apply a function not returning a numeric value to a pandas rolling Window?


Note that the apply function for a Rolling object is different from the apply function for a Series object and I agree with you that this is a bit confusing. In my understanding, the functions applied to rolling windows are typically meant for aggregation of data (such as sum, count etc.).

However, you can convert your rolling windows to a list and apply the function to that list (thanks to this discussion).

So my approach would be:

import numpy as npimport pandas as pdnp.random.seed(1)number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])number_series = number_series.apply(lambda x: float(x))def func(s):    if len(s) > 2:        if s[-1] > s[-2] > s[-3]:            return 'High'        elif s[-1] > s[-2]:            return 'Medium'        else:            return 'Low'    else:        return ''list = [func(window) for window in list(number_series.rolling(5))]new_series = pd.Series(list, index=number_series.index)

Also note that func needs to handle the first items differently because indices would otherwise be out of bounds.


One approach is to:

  1. Get the WindowIndexer or the rolling() method.
  2. Apply func returning a string and storing the results as a list
  3. Convert back your results to a series.
import numpy as npimport pandas as pdnp.random.seed(1)number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])number_series = number_series.apply(lambda x: float(x))def func(s):    if (len(s) >= 3) and (s[-1] > s[-2] > s[-3]):        return 'High'    elif (len(s) >= 2) and s[-1] > s[-2]:        return 'Medium'    else:        return 'Low'  # Step 1: Get the window indexer  window_indexer = number_series.rolling(5)._get_window_indexer()start, end = window_indexer.get_window_bounds(num_values=len(number_series))# Step 2: Apply funcresults = [func(number_series.iloc[slice(s, e)]) for s, e in zip(start, end)]   # Step 3: Get results back to a pandas Seriesnew_series = pd.Series(results, index=number_series.index)new_series>>>2000-01-02       Low2000-01-09       Low2000-01-16    Medium2000-01-23       Low2000-01-30    Medium               ...  2001-10-28       Low2001-11-04    Medium2001-11-11      High2001-11-18      High2001-11-25       LowLength: 100, dtype: object


Here's another way using boolean 'or' trick with a list and pd.Series constructor:

import numpy as npimport pandas as pdnp.random.seed(1)number_series = pd.Series(np.random.randint(low=1,high=100,size=100),index=[pd.date_range(start='2000-01-01',freq='W',periods=100)])number_series = number_series.apply(lambda x: float(x))def func(s):        if s[-1] > s[-2] > s[-3]:        return 'High'    elif s[-1] > s[-2]:        return 'Medium'    else:        return 'Low'l = []new_series = number_series.rolling(5).apply(lambda x: l.append(func(x)) or 0)pd.Series(l, index=number_series.index[:len(l)])