Number of rows in a rolling window of 30 days
def get_rolling_amount(grp, freq): return grp.rolling(freq, on="Date", closed="both").count()df["Date"] = pd.to_datetime(df["Date"])df["Amount"] = df.groupby("Account").apply(get_rolling_amount, "30D").valuesprint(df)
Prints:
Account Date Amount0 10 2020-06-01 11 10 2020-06-11 22 10 2020-06-21 33 10 2020-06-25 44 10 2020-07-11 45 10 2020-07-15 46 11 2020-06-01 17 11 2020-06-11 28 11 2020-06-21 39 11 2020-06-25 410 11 2020-07-11 411 11 2020-07-15 4
You can use broadcasting within group to check how many rows fall within X days.
import pandas as pddef within_days(s, days): arr = ((s.to_numpy() >= s.to_numpy()[:, None]) & (s.to_numpy() <= (s + pd.offsets.DateOffset(days=days)).to_numpy()[:, None])).sum(axis=0) return pd.Series(arr, index=s.index)df['Amount'] = df.groupby('Account')['Date'].apply(within_days, days=30)
Account Date Amount0 10 2020-06-01 11 10 2020-06-11 22 10 2020-06-21 33 10 2020-06-25 44 10 2020-07-11 45 10 2020-07-15 46 11 2020-06-01 17 11 2020-06-11 28 11 2020-06-21 39 11 2020-06-25 410 11 2020-07-11 411 11 2020-07-15 4
df = df.resample('30D').agg({'date':'count','Amount':'sum'})
This will aggregate the 'Date' column by count, getting the data you want.
However, since you will need to first set date as your index for resampling, you could create a "dummy" column containing zeros:
df['dummy'] = pd.Series(np.zeros(len(df))