resample a start & end employee holiday table correctly resample a start & end employee holiday table correctly pandas pandas

resample a start & end employee holiday table correctly


IIUC,

df_out = (df.set_index(['name','holiday_type'])            .apply(lambda x: pd.date_range(x['start_date'], x['end_date']), axis=1)            .explode().rename('date').reset_index())

Output:

    name holiday_type       date0   Khan      holiday 2020-01-011   Khan      holiday 2020-01-022   Khan      holiday 2020-01-033   Khan      holiday 2020-02-044   Khan      holiday 2020-02-05..   ...          ...        ...76  Dean   sick leave 2020-12-2777  Dean   sick leave 2020-12-2878  Dean   sick leave 2020-12-2979  Dean   sick leave 2020-12-3080  Dean   sick leave 2020-12-31[81 rows x 3 columns]

Dictionary output:

df_out.to_dict()

Output:

{'name': {0: 'Khan',  1: 'Khan',  2: 'Khan',  3: 'Khan',  4: 'Khan',  5: 'Khan',  6: 'Khan',  7: 'Khan',  8: 'Khan',  9: 'Khan',  10: 'Dean',  11: 'Dean',  12: 'Dean',  13: 'Dean',  14: 'Dean',  15: 'Dean',  16: 'Dean',  17: 'Dean',  18: 'Dean',  19: 'Dean',  20: 'Dean',  21: 'Dean',  22: 'Dean',  23: 'Dean',  24: 'Dean',  25: 'Dean',  26: 'Dean',  27: 'Dean',  28: 'Dean',  29: 'Dean',  30: 'Dean',  31: 'Dean',  32: 'Dean',  33: 'Dean',  34: 'Dean',  35: 'Dean',  36: 'Dean',  37: 'Dean',  38: 'Dean',  39: 'Dean',  40: 'Dean',  41: 'Dean',  42: 'Dean',  43: 'Dean',  44: 'Dean',  45: 'Dean',  46: 'Dean',  47: 'Dean',  48: 'Dean',  49: 'Dean',  50: 'Dean',  51: 'Dean',  52: 'Dean',  53: 'Dean',  54: 'Dean',  55: 'Dean',  56: 'Dean',  57: 'Dean',  58: 'Dean',  59: 'Dean',  60: 'Dean',  61: 'Dean',  62: 'Dean',  63: 'Dean',  64: 'Dean',  65: 'Dean',  66: 'Dean',  67: 'Dean',  68: 'Dean',  69: 'Dean',  70: 'Dean',  71: 'Dean',  72: 'Dean',  73: 'Dean',  74: 'Dean',  75: 'Dean',  76: 'Dean',  77: 'Dean',  78: 'Dean',  79: 'Dean',  80: 'Dean'}, 'holiday_type': {0: 'holiday',  1: 'holiday',  2: 'holiday',  3: 'holiday',  4: 'holiday',  5: 'holiday',  6: 'holiday',  7: 'holiday',  8: 'holiday',  9: 'sick leave',  10: 'holiday',  11: 'holiday',  12: 'holiday',  13: 'holiday',  14: 'holiday',  15: 'holiday',  16: 'holiday',  17: 'holiday',  18: 'holiday',  19: 'holiday',  20: 'holiday',  21: 'holiday',  22: 'holiday',  23: 'holiday',  24: 'holiday',  25: 'holiday',  26: 'holiday',  27: 'holiday',  28: 'holiday',  29: 'holiday',  30: 'holiday',  31: 'holiday',  32: 'holiday',  33: 'holiday',  34: 'holiday',  35: 'holiday',  36: 'holiday',  37: 'holiday',  38: 'holiday',  39: 'holiday',  40: 'holiday',  41: 'holiday',  42: 'holiday',  43: 'holiday',  44: 'holiday',  45: 'holiday',  46: 'holiday',  47: 'holiday',  48: 'holiday',  49: 'holiday',  50: 'holiday',  51: 'holiday',  52: 'holiday',  53: 'holiday',  54: 'holiday',  55: 'holiday',  56: 'holiday',  57: 'holiday',  58: 'holiday',  59: 'holiday',  60: 'holiday',  61: 'sick leave',  62: 'sick leave',  63: 'sick leave',  64: 'sick leave',  65: 'sick leave',  66: 'sick leave',  67: 'sick leave',  68: 'sick leave',  69: 'sick leave',  70: 'sick leave',  71: 'sick leave',  72: 'sick leave',  73: 'sick leave',  74: 'sick leave',  75: 'sick leave',  76: 'sick leave',  77: 'sick leave',  78: 'sick leave',  79: 'sick leave',  80: 'sick leave'}, 'date': {0: Timestamp('2020-01-01 00:00:00'),  1: Timestamp('2020-01-02 00:00:00'),  2: Timestamp('2020-01-03 00:00:00'),  3: Timestamp('2020-02-04 00:00:00'),  4: Timestamp('2020-02-05 00:00:00'),  5: Timestamp('2020-02-06 00:00:00'),  6: Timestamp('2020-02-07 00:00:00'),  7: Timestamp('2020-02-08 00:00:00'),  8: Timestamp('2020-02-09 00:00:00'),  9: Timestamp('2020-03-02 00:00:00'),  10: Timestamp('2020-04-09 00:00:00'),  11: Timestamp('2020-04-10 00:00:00'),  12: Timestamp('2020-04-11 00:00:00'),  13: Timestamp('2020-04-12 00:00:00'),  14: Timestamp('2020-04-13 00:00:00'),  15: Timestamp('2020-04-14 00:00:00'),  16: Timestamp('2020-04-15 00:00:00'),  17: Timestamp('2020-04-16 00:00:00'),  18: Timestamp('2020-04-17 00:00:00'),  19: Timestamp('2020-04-18 00:00:00'),  20: Timestamp('2020-04-19 00:00:00'),  21: Timestamp('2020-04-20 00:00:00'),  22: Timestamp('2020-04-21 00:00:00'),  23: Timestamp('2020-04-22 00:00:00'),  24: Timestamp('2020-04-23 00:00:00'),  25: Timestamp('2020-04-24 00:00:00'),  26: Timestamp('2020-04-25 00:00:00'),  27: Timestamp('2020-04-26 00:00:00'),  28: Timestamp('2020-04-27 00:00:00'),  29: Timestamp('2020-04-28 00:00:00'),  30: Timestamp('2020-04-29 00:00:00'),  31: Timestamp('2020-04-30 00:00:00'),  32: Timestamp('2020-05-01 00:00:00'),  33: Timestamp('2020-05-02 00:00:00'),  34: Timestamp('2020-05-03 00:00:00'),  35: Timestamp('2020-05-04 00:00:00'),  36: Timestamp('2020-05-05 00:00:00'),  37: Timestamp('2020-05-06 00:00:00'),  38: Timestamp('2020-05-07 00:00:00'),  39: Timestamp('2020-05-08 00:00:00'),  40: Timestamp('2020-05-09 00:00:00'),  41: Timestamp('2020-05-10 00:00:00'),  42: Timestamp('2020-05-11 00:00:00'),  43: Timestamp('2020-05-12 00:00:00'),  44: Timestamp('2020-05-13 00:00:00'),  45: Timestamp('2020-05-14 00:00:00'),  46: Timestamp('2020-05-15 00:00:00'),  47: Timestamp('2020-08-06 00:00:00'),  48: Timestamp('2020-08-07 00:00:00'),  49: Timestamp('2020-08-08 00:00:00'),  50: Timestamp('2020-08-09 00:00:00'),  51: Timestamp('2020-08-10 00:00:00'),  52: Timestamp('2020-08-11 00:00:00'),  53: Timestamp('2020-08-12 00:00:00'),  54: Timestamp('2020-08-13 00:00:00'),  55: Timestamp('2020-08-14 00:00:00'),  56: Timestamp('2020-08-15 00:00:00'),  57: Timestamp('2020-08-16 00:00:00'),  58: Timestamp('2020-08-17 00:00:00'),  59: Timestamp('2020-08-18 00:00:00'),  60: Timestamp('2020-08-19 00:00:00'),  61: Timestamp('2020-12-12 00:00:00'),  62: Timestamp('2020-12-13 00:00:00'),  63: Timestamp('2020-12-14 00:00:00'),  64: Timestamp('2020-12-15 00:00:00'),  65: Timestamp('2020-12-16 00:00:00'),  66: Timestamp('2020-12-17 00:00:00'),  67: Timestamp('2020-12-18 00:00:00'),  68: Timestamp('2020-12-19 00:00:00'),  69: Timestamp('2020-12-20 00:00:00'),  70: Timestamp('2020-12-21 00:00:00'),  71: Timestamp('2020-12-22 00:00:00'),  72: Timestamp('2020-12-23 00:00:00'),  73: Timestamp('2020-12-24 00:00:00'),  74: Timestamp('2020-12-25 00:00:00'),  75: Timestamp('2020-12-26 00:00:00'),  76: Timestamp('2020-12-27 00:00:00'),  77: Timestamp('2020-12-28 00:00:00'),  78: Timestamp('2020-12-29 00:00:00'),  79: Timestamp('2020-12-30 00:00:00'),  80: Timestamp('2020-12-31 00:00:00')}}


Similar to @Scott Boston but with groupby.resample:

(df.set_index(['name','holiday_type'], append=True).stack()   .reset_index(name='date_range')   .set_index('date_range')   .groupby('level_0')   .resample('D')['name','holiday_type'].ffill()   .reset_index()   [['name', 'date_range', 'holiday_type']])    name date_range holiday_type0   Khan 2020-01-01      holiday1   Khan 2020-01-02      holiday2   Khan 2020-01-03      holiday3   Khan 2020-02-04      holiday4   Khan 2020-02-05      holiday5   Khan 2020-02-06      holiday6   Khan 2020-02-07      holiday7   Khan 2020-02-08      holiday8   Khan 2020-02-09      holiday9   Khan 2020-03-02   sick leave10  Dean 2020-04-09      holiday11  Dean 2020-04-10      holiday


Alternate solution using pd.Series.map.

df.set_index(['name','holiday_type'])df['date_range'] = df[['start_date','end_date']].valuesdf.date_range.map(lambda x:pd.date_range(*x)).explode().reset_index()    name holiday_type date_range0   Khan      holiday 2020-01-011   Khan      holiday 2020-01-022   Khan      holiday 2020-01-033   Khan      holiday 2020-02-044   Khan      holiday 2020-02-05..   ...          ...        ...76  Dean   sick leave 2020-12-2777  Dean   sick leave 2020-12-2878  Dean   sick leave 2020-12-2979  Dean   sick leave 2020-12-3080  Dean   sick leave 2020-12-31[81 rows x 3 columns]