Number of rows in a rolling window of 30 days Number of rows in a rolling window of 30 days pandas pandas

Number of rows in a rolling window of 30 days


def get_rolling_amount(grp, freq):    return grp.rolling(freq, on="Date", closed="both").count()df["Date"] = pd.to_datetime(df["Date"])df["Amount"] = df.groupby("Account").apply(get_rolling_amount, "30D").valuesprint(df)

Prints:

    Account       Date Amount0        10 2020-06-01      11        10 2020-06-11      22        10 2020-06-21      33        10 2020-06-25      44        10 2020-07-11      45        10 2020-07-15      46        11 2020-06-01      17        11 2020-06-11      28        11 2020-06-21      39        11 2020-06-25      410       11 2020-07-11      411       11 2020-07-15      4


You can use broadcasting within group to check how many rows fall within X days.

import pandas as pddef within_days(s, days):    arr = ((s.to_numpy() >= s.to_numpy()[:, None])            & (s.to_numpy() <= (s + pd.offsets.DateOffset(days=days)).to_numpy()[:, None])).sum(axis=0)    return pd.Series(arr, index=s.index)df['Amount'] = df.groupby('Account')['Date'].apply(within_days, days=30)

    Account       Date  Amount0        10 2020-06-01       11        10 2020-06-11       22        10 2020-06-21       33        10 2020-06-25       44        10 2020-07-11       45        10 2020-07-15       46        11 2020-06-01       17        11 2020-06-11       28        11 2020-06-21       39        11 2020-06-25       410       11 2020-07-11       411       11 2020-07-15       4


df = df.resample('30D').agg({'date':'count','Amount':'sum'})

This will aggregate the 'Date' column by count, getting the data you want.

However, since you will need to first set date as your index for resampling, you could create a "dummy" column containing zeros:

df['dummy'] = pd.Series(np.zeros(len(df))