Fill dates on dataframe within groups with same ending
The key here is create the min
and max
within different group , then we create the range and explode
merge
back
# find the min date for each shop under each items = df.groupby(['item','shop'])[['date']].min()# find the global maxs['datemax'] = df['date'].max()# combine two results s['date'] = [pd.date_range(x,y) for x , y in zip(s['date'],s['datemax'])]out = s.explode('date').reset_index().merge(df,how='left').fillna(0)out item shop date datemax qty0 1 A 2018-01-02 2018-01-05 5.01 1 A 2018-01-03 2018-01-05 6.02 1 A 2018-01-04 2018-01-05 0.03 1 A 2018-01-05 2018-01-05 0.04 1 B 2018-01-04 2018-01-05 9.05 1 B 2018-01-05 2018-01-05 10.06 2 A 2018-01-01 2018-01-05 7.07 2 A 2018-01-02 2018-01-05 0.08 2 A 2018-01-03 2018-01-05 0.09 2 A 2018-01-04 2018-01-05 8.010 2 A 2018-01-05 2018-01-05 0.0
I think this gives you what you want (columns are ordered differently)
max_date = df.date.max()def reindex_to_max_date(df): return df.set_index('date').reindex(pd.date_range(df.date.min(), max_date, name='date'), fill_value=0)res = df.groupby(['shop', 'item']).apply(reindex_to_max_date)res = res.qty.reset_index()
I grouped by shop, item to give the same sort order as you have in out
but these can be swapped.
Not sure if this is the most efficient way but one idea is to create a dataframe with all the dates and do a left join at shop-item level as followinf
Initial data
import pandas as pddf = pd.DataFrame({'item': [1,1,2,2,1,1], 'shop': ['A','A','A','A','B','B'], 'date': pd.to_datetime(['2018.01.'+ str(x) for x in [2,3,1,4,4,5]]), 'qty': [5,6,7,8,9,10]})df = df.set_index('date')\ .groupby(['item', 'shop'])\ .resample("D")['qty']\ .sum()\ .reset_index(name='qty')
Dataframe with all dates
We first get the max and min date
rg = df.agg({"date":{"min", "max"}})
and then we create a df with all possible dates
df_dates = pd.DataFrame( {"date": pd.date_range( start=rg["date"]["min"], end=rg["date"]["max"]) })
Complete dates
Now for every shop item we do a left join with all possible dates
def complete_dates(x, df_dates): item = x["item"].iloc[0] shop = x["shop"].iloc[0] x = pd.merge(df_dates, x, on=["date"], how="left") x["item"] = item x["shop"] = shop return x
And we finally apply this function to the original df
.
df.groupby(["item", "shop"])\ .apply(lambda x: complete_dates(x, df_dates) )\ .reset_index(drop=True)
date item shop qty0 2018-01-01 1 A NaN1 2018-01-02 1 A 5.02 2018-01-03 1 A 6.03 2018-01-04 1 A NaN4 2018-01-05 1 A NaN5 2018-01-01 1 B NaN6 2018-01-02 1 B NaN7 2018-01-03 1 B NaN8 2018-01-04 1 B 9.09 2018-01-05 1 B 10.010 2018-01-01 2 A 7.011 2018-01-02 2 A 0.012 2018-01-03 2 A 0.013 2018-01-04 2 A 8.014 2018-01-05 2 A NaN