Fill dates on dataframe within groups with same ending Fill dates on dataframe within groups with same ending pandas pandas

Fill dates on dataframe within groups with same ending


The key here is create the min and max within different group , then we create the range and explode merge back

# find the min date for each shop under each items = df.groupby(['item','shop'])[['date']].min()# find the global maxs['datemax'] = df['date'].max()# combine two results s['date'] = [pd.date_range(x,y) for x , y in zip(s['date'],s['datemax'])]out = s.explode('date').reset_index().merge(df,how='left').fillna(0)out    item shop       date    datemax   qty0      1    A 2018-01-02 2018-01-05   5.01      1    A 2018-01-03 2018-01-05   6.02      1    A 2018-01-04 2018-01-05   0.03      1    A 2018-01-05 2018-01-05   0.04      1    B 2018-01-04 2018-01-05   9.05      1    B 2018-01-05 2018-01-05  10.06      2    A 2018-01-01 2018-01-05   7.07      2    A 2018-01-02 2018-01-05   0.08      2    A 2018-01-03 2018-01-05   0.09      2    A 2018-01-04 2018-01-05   8.010     2    A 2018-01-05 2018-01-05   0.0


I think this gives you what you want (columns are ordered differently)

max_date = df.date.max()def reindex_to_max_date(df):    return df.set_index('date').reindex(pd.date_range(df.date.min(), max_date, name='date'), fill_value=0)res = df.groupby(['shop', 'item']).apply(reindex_to_max_date)res = res.qty.reset_index()

I grouped by shop, item to give the same sort order as you have in out but these can be swapped.


Not sure if this is the most efficient way but one idea is to create a dataframe with all the dates and do a left join at shop-item level as followinf

Initial data

import pandas as pddf = pd.DataFrame({'item': [1,1,2,2,1,1],                   'shop': ['A','A','A','A','B','B'],                   'date': pd.to_datetime(['2018.01.'+ str(x)                                            for x in [2,3,1,4,4,5]]),                   'qty': [5,6,7,8,9,10]})df = df.set_index('date')\       .groupby(['item', 'shop'])\       .resample("D")['qty']\       .sum()\       .reset_index(name='qty')

Dataframe with all dates

We first get the max and min date

rg = df.agg({"date":{"min", "max"}})

and then we create a df with all possible dates

df_dates = pd.DataFrame(    {"date": pd.date_range(        start=rg["date"]["min"],        end=rg["date"]["max"])    })

Complete dates

Now for every shop item we do a left join with all possible dates

def complete_dates(x, df_dates):    item = x["item"].iloc[0]    shop = x["shop"].iloc[0]    x = pd.merge(df_dates, x,                 on=["date"],                 how="left")    x["item"] = item    x["shop"] = shop    return x

And we finally apply this function to the original df.

df.groupby(["item", "shop"])\  .apply(lambda x:          complete_dates(x, df_dates)        )\  .reset_index(drop=True)
         date  item shop   qty0  2018-01-01     1    A   NaN1  2018-01-02     1    A   5.02  2018-01-03     1    A   6.03  2018-01-04     1    A   NaN4  2018-01-05     1    A   NaN5  2018-01-01     1    B   NaN6  2018-01-02     1    B   NaN7  2018-01-03     1    B   NaN8  2018-01-04     1    B   9.09  2018-01-05     1    B  10.010 2018-01-01     2    A   7.011 2018-01-02     2    A   0.012 2018-01-03     2    A   0.013 2018-01-04     2    A   8.014 2018-01-05     2    A   NaN