Pandas resample with start date Pandas resample with start date pandas pandas

Pandas resample with start date


My answer feels a little hacky, but uses resample and gives the desired output. Find the date one bin length (e.g. 4 months, or month ends specifically) before the specified date, append it to s, and then resample:

rule = '4M'date = '02-29-2020'base_date = pd.to_datetime(date) - pd.tseries.frequencies.to_offset(rule)s.loc[base_date] = np.nanoutput = s.resample(rule=rule, label='right',).count()output=output[output.index >= date]

Result:

2020-02-29     322020-06-30    1222020-10-31    1232021-02-28    1202021-06-30    1222021-10-31      4Freq: 4M, dtype: int64

I added output=output[output.index >= date] b/c otherwise you get an additional empty bin:

2019-10-31      02020-02-29     322020-06-30    1222020-10-31    1232021-02-28    1202021-06-30    1222021-10-31      4Freq: 4M, dtype: int64


All you need to use is pd.cut like below:

>>> gb = pd.cut(s.index, bins).value_counts()>>> gb.index = gb.index.categories.right>>> gb2020-02-29     322020-06-30    1222020-10-31    1232021-02-28    1202021-06-30    1222021-10-31      4dtype: int64

there is no need to use groupby


Another way when dealing with months intervals could be to convert the datetime index to an integer from year and month, remove the start_date defined and some modulo value with the rule. use this in a groupby.

rule = '4M'start = "2020-02-29"# change types of valued = pd.Timestamp(start)nb = int(rule[:-1])gr = s.groupby(d+(1+((s.index.year*12+s.index.month) #convert datetime index to int                      -(d.year*12+d.month+1))//nb) # remove start and modulo rule                  *pd.tseries.frequencies.to_offset(rule) # get rule freq              ).count()print (gr)2020-02-29     322020-06-30    1212020-10-31    1232021-02-28    1202021-06-30    1222021-10-31      4dtype: int64

Now compared to your method, let's say you define a date you want not being within the first X months define by your rule like 2020-07-31 with the same rule (4M). with this method, it gives:

2020-03-31     63 #you get this interval2020-07-31    1212020-11-30    1222021-03-31    1212021-07-31     95dtype: int64 

while with your method, you get:

2020-07-31    121  #you loose info from before the 2020-03-312020-11-30    1222021-03-31    1212021-07-31     95dtype: int64

I know you stated in the question that you define the first date but with this method you could define any date as long as the rule is in month