Pandas resample with start date
My answer feels a little hacky, but uses resample
and gives the desired output. Find the date one bin length (e.g. 4 months, or month ends specifically) before the specified date, append it to s
, and then resample
:
rule = '4M'date = '02-29-2020'base_date = pd.to_datetime(date) - pd.tseries.frequencies.to_offset(rule)s.loc[base_date] = np.nanoutput = s.resample(rule=rule, label='right',).count()output=output[output.index >= date]
Result:
2020-02-29 322020-06-30 1222020-10-31 1232021-02-28 1202021-06-30 1222021-10-31 4Freq: 4M, dtype: int64
I added output=output[output.index >= date]
b/c otherwise you get an additional empty bin:
2019-10-31 02020-02-29 322020-06-30 1222020-10-31 1232021-02-28 1202021-06-30 1222021-10-31 4Freq: 4M, dtype: int64
All you need to use is pd.cut
like below:
>>> gb = pd.cut(s.index, bins).value_counts()>>> gb.index = gb.index.categories.right>>> gb2020-02-29 322020-06-30 1222020-10-31 1232021-02-28 1202021-06-30 1222021-10-31 4dtype: int64
there is no need to use groupby
Another way when dealing with months intervals could be to convert the datetime index to an integer from year and month, remove the start_date defined and some modulo value with the rule. use this in a groupby.
rule = '4M'start = "2020-02-29"# change types of valued = pd.Timestamp(start)nb = int(rule[:-1])gr = s.groupby(d+(1+((s.index.year*12+s.index.month) #convert datetime index to int -(d.year*12+d.month+1))//nb) # remove start and modulo rule *pd.tseries.frequencies.to_offset(rule) # get rule freq ).count()print (gr)2020-02-29 322020-06-30 1212020-10-31 1232021-02-28 1202021-06-30 1222021-10-31 4dtype: int64
Now compared to your method, let's say you define a date you want not being within the first X months define by your rule like 2020-07-31 with the same rule (4M). with this method, it gives:
2020-03-31 63 #you get this interval2020-07-31 1212020-11-30 1222021-03-31 1212021-07-31 95dtype: int64
while with your method, you get:
2020-07-31 121 #you loose info from before the 2020-03-312020-11-30 1222021-03-31 1212021-07-31 95dtype: int64
I know you stated in the question that you define the first date but with this method you could define any date as long as the rule is in month