resample a start & end employee holiday table correctly
IIUC,
df_out = (df.set_index(['name','holiday_type']) .apply(lambda x: pd.date_range(x['start_date'], x['end_date']), axis=1) .explode().rename('date').reset_index())
Output:
name holiday_type date0 Khan holiday 2020-01-011 Khan holiday 2020-01-022 Khan holiday 2020-01-033 Khan holiday 2020-02-044 Khan holiday 2020-02-05.. ... ... ...76 Dean sick leave 2020-12-2777 Dean sick leave 2020-12-2878 Dean sick leave 2020-12-2979 Dean sick leave 2020-12-3080 Dean sick leave 2020-12-31[81 rows x 3 columns]
Dictionary output:
df_out.to_dict()
Output:
{'name': {0: 'Khan', 1: 'Khan', 2: 'Khan', 3: 'Khan', 4: 'Khan', 5: 'Khan', 6: 'Khan', 7: 'Khan', 8: 'Khan', 9: 'Khan', 10: 'Dean', 11: 'Dean', 12: 'Dean', 13: 'Dean', 14: 'Dean', 15: 'Dean', 16: 'Dean', 17: 'Dean', 18: 'Dean', 19: 'Dean', 20: 'Dean', 21: 'Dean', 22: 'Dean', 23: 'Dean', 24: 'Dean', 25: 'Dean', 26: 'Dean', 27: 'Dean', 28: 'Dean', 29: 'Dean', 30: 'Dean', 31: 'Dean', 32: 'Dean', 33: 'Dean', 34: 'Dean', 35: 'Dean', 36: 'Dean', 37: 'Dean', 38: 'Dean', 39: 'Dean', 40: 'Dean', 41: 'Dean', 42: 'Dean', 43: 'Dean', 44: 'Dean', 45: 'Dean', 46: 'Dean', 47: 'Dean', 48: 'Dean', 49: 'Dean', 50: 'Dean', 51: 'Dean', 52: 'Dean', 53: 'Dean', 54: 'Dean', 55: 'Dean', 56: 'Dean', 57: 'Dean', 58: 'Dean', 59: 'Dean', 60: 'Dean', 61: 'Dean', 62: 'Dean', 63: 'Dean', 64: 'Dean', 65: 'Dean', 66: 'Dean', 67: 'Dean', 68: 'Dean', 69: 'Dean', 70: 'Dean', 71: 'Dean', 72: 'Dean', 73: 'Dean', 74: 'Dean', 75: 'Dean', 76: 'Dean', 77: 'Dean', 78: 'Dean', 79: 'Dean', 80: 'Dean'}, 'holiday_type': {0: 'holiday', 1: 'holiday', 2: 'holiday', 3: 'holiday', 4: 'holiday', 5: 'holiday', 6: 'holiday', 7: 'holiday', 8: 'holiday', 9: 'sick leave', 10: 'holiday', 11: 'holiday', 12: 'holiday', 13: 'holiday', 14: 'holiday', 15: 'holiday', 16: 'holiday', 17: 'holiday', 18: 'holiday', 19: 'holiday', 20: 'holiday', 21: 'holiday', 22: 'holiday', 23: 'holiday', 24: 'holiday', 25: 'holiday', 26: 'holiday', 27: 'holiday', 28: 'holiday', 29: 'holiday', 30: 'holiday', 31: 'holiday', 32: 'holiday', 33: 'holiday', 34: 'holiday', 35: 'holiday', 36: 'holiday', 37: 'holiday', 38: 'holiday', 39: 'holiday', 40: 'holiday', 41: 'holiday', 42: 'holiday', 43: 'holiday', 44: 'holiday', 45: 'holiday', 46: 'holiday', 47: 'holiday', 48: 'holiday', 49: 'holiday', 50: 'holiday', 51: 'holiday', 52: 'holiday', 53: 'holiday', 54: 'holiday', 55: 'holiday', 56: 'holiday', 57: 'holiday', 58: 'holiday', 59: 'holiday', 60: 'holiday', 61: 'sick leave', 62: 'sick leave', 63: 'sick leave', 64: 'sick leave', 65: 'sick leave', 66: 'sick leave', 67: 'sick leave', 68: 'sick leave', 69: 'sick leave', 70: 'sick leave', 71: 'sick leave', 72: 'sick leave', 73: 'sick leave', 74: 'sick leave', 75: 'sick leave', 76: 'sick leave', 77: 'sick leave', 78: 'sick leave', 79: 'sick leave', 80: 'sick leave'}, 'date': {0: Timestamp('2020-01-01 00:00:00'), 1: Timestamp('2020-01-02 00:00:00'), 2: Timestamp('2020-01-03 00:00:00'), 3: Timestamp('2020-02-04 00:00:00'), 4: Timestamp('2020-02-05 00:00:00'), 5: Timestamp('2020-02-06 00:00:00'), 6: Timestamp('2020-02-07 00:00:00'), 7: Timestamp('2020-02-08 00:00:00'), 8: Timestamp('2020-02-09 00:00:00'), 9: Timestamp('2020-03-02 00:00:00'), 10: Timestamp('2020-04-09 00:00:00'), 11: Timestamp('2020-04-10 00:00:00'), 12: Timestamp('2020-04-11 00:00:00'), 13: Timestamp('2020-04-12 00:00:00'), 14: Timestamp('2020-04-13 00:00:00'), 15: Timestamp('2020-04-14 00:00:00'), 16: Timestamp('2020-04-15 00:00:00'), 17: Timestamp('2020-04-16 00:00:00'), 18: Timestamp('2020-04-17 00:00:00'), 19: Timestamp('2020-04-18 00:00:00'), 20: Timestamp('2020-04-19 00:00:00'), 21: Timestamp('2020-04-20 00:00:00'), 22: Timestamp('2020-04-21 00:00:00'), 23: Timestamp('2020-04-22 00:00:00'), 24: Timestamp('2020-04-23 00:00:00'), 25: Timestamp('2020-04-24 00:00:00'), 26: Timestamp('2020-04-25 00:00:00'), 27: Timestamp('2020-04-26 00:00:00'), 28: Timestamp('2020-04-27 00:00:00'), 29: Timestamp('2020-04-28 00:00:00'), 30: Timestamp('2020-04-29 00:00:00'), 31: Timestamp('2020-04-30 00:00:00'), 32: Timestamp('2020-05-01 00:00:00'), 33: Timestamp('2020-05-02 00:00:00'), 34: Timestamp('2020-05-03 00:00:00'), 35: Timestamp('2020-05-04 00:00:00'), 36: Timestamp('2020-05-05 00:00:00'), 37: Timestamp('2020-05-06 00:00:00'), 38: Timestamp('2020-05-07 00:00:00'), 39: Timestamp('2020-05-08 00:00:00'), 40: Timestamp('2020-05-09 00:00:00'), 41: Timestamp('2020-05-10 00:00:00'), 42: Timestamp('2020-05-11 00:00:00'), 43: Timestamp('2020-05-12 00:00:00'), 44: Timestamp('2020-05-13 00:00:00'), 45: Timestamp('2020-05-14 00:00:00'), 46: Timestamp('2020-05-15 00:00:00'), 47: Timestamp('2020-08-06 00:00:00'), 48: Timestamp('2020-08-07 00:00:00'), 49: Timestamp('2020-08-08 00:00:00'), 50: Timestamp('2020-08-09 00:00:00'), 51: Timestamp('2020-08-10 00:00:00'), 52: Timestamp('2020-08-11 00:00:00'), 53: Timestamp('2020-08-12 00:00:00'), 54: Timestamp('2020-08-13 00:00:00'), 55: Timestamp('2020-08-14 00:00:00'), 56: Timestamp('2020-08-15 00:00:00'), 57: Timestamp('2020-08-16 00:00:00'), 58: Timestamp('2020-08-17 00:00:00'), 59: Timestamp('2020-08-18 00:00:00'), 60: Timestamp('2020-08-19 00:00:00'), 61: Timestamp('2020-12-12 00:00:00'), 62: Timestamp('2020-12-13 00:00:00'), 63: Timestamp('2020-12-14 00:00:00'), 64: Timestamp('2020-12-15 00:00:00'), 65: Timestamp('2020-12-16 00:00:00'), 66: Timestamp('2020-12-17 00:00:00'), 67: Timestamp('2020-12-18 00:00:00'), 68: Timestamp('2020-12-19 00:00:00'), 69: Timestamp('2020-12-20 00:00:00'), 70: Timestamp('2020-12-21 00:00:00'), 71: Timestamp('2020-12-22 00:00:00'), 72: Timestamp('2020-12-23 00:00:00'), 73: Timestamp('2020-12-24 00:00:00'), 74: Timestamp('2020-12-25 00:00:00'), 75: Timestamp('2020-12-26 00:00:00'), 76: Timestamp('2020-12-27 00:00:00'), 77: Timestamp('2020-12-28 00:00:00'), 78: Timestamp('2020-12-29 00:00:00'), 79: Timestamp('2020-12-30 00:00:00'), 80: Timestamp('2020-12-31 00:00:00')}}
Similar to @Scott Boston but with groupby.resample
:
(df.set_index(['name','holiday_type'], append=True).stack() .reset_index(name='date_range') .set_index('date_range') .groupby('level_0') .resample('D')['name','holiday_type'].ffill() .reset_index() [['name', 'date_range', 'holiday_type']]) name date_range holiday_type0 Khan 2020-01-01 holiday1 Khan 2020-01-02 holiday2 Khan 2020-01-03 holiday3 Khan 2020-02-04 holiday4 Khan 2020-02-05 holiday5 Khan 2020-02-06 holiday6 Khan 2020-02-07 holiday7 Khan 2020-02-08 holiday8 Khan 2020-02-09 holiday9 Khan 2020-03-02 sick leave10 Dean 2020-04-09 holiday11 Dean 2020-04-10 holiday
Alternate solution using pd.Series.map
.
df.set_index(['name','holiday_type'])df['date_range'] = df[['start_date','end_date']].valuesdf.date_range.map(lambda x:pd.date_range(*x)).explode().reset_index() name holiday_type date_range0 Khan holiday 2020-01-011 Khan holiday 2020-01-022 Khan holiday 2020-01-033 Khan holiday 2020-02-044 Khan holiday 2020-02-05.. ... ... ...76 Dean sick leave 2020-12-2777 Dean sick leave 2020-12-2878 Dean sick leave 2020-12-2979 Dean sick leave 2020-12-3080 Dean sick leave 2020-12-31[81 rows x 3 columns]