Calculating row-wise time difference in python
I think for correct maximal and minimal values are converted columns to datetimes and then subtract Series
created by GroupBy.transform
:
my_df['s'] = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])my_df['e'] = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])g = my_df.groupby(['id', 'date'])my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))print (my_df) id date arriving_time leaving_time s \0 a 2020/02/03 14:36:06 14:40:05 2020-02-03 14:36:06 1 b 2020/04/05 08:52:02 08:52:41 2020-04-05 08:52:02 2 b 2020/04/05 08:53:02 08:54:33 2020-04-05 08:53:02 3 b 2020/04/05 08:55:24 08:57:14 2020-04-05 08:55:24 4 b 2020/04/06 18:58:03 19:01:07 2020-04-06 18:58:03 5 b 2020/04/06 19:03:05 19:04:08 2020-04-06 19:03:05 6 c 2020/12/15 17:04:28 17:09:48 2020-12-15 17:04:28 7 d 2020/06/23 21:31:23 21:50:12 2020-06-23 21:31:23 e travelTime 0 2020-02-03 14:40:05 00:03:59 1 2020-04-05 08:52:41 00:05:12 2 2020-04-05 08:54:33 00:05:12 3 2020-04-05 08:57:14 00:05:12 4 2020-04-06 19:01:07 00:06:05 5 2020-04-06 19:04:08 00:06:05 6 2020-12-15 17:09:48 00:05:20 7 2020-06-23 21:50:12 00:18:49
For avoid new columns is possible use DataFrame.assign
Series with datetimes
:
s = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])e = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])g = my_df.assign(s=s, e=e).groupby(['id', 'date'])my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))print (my_df) id date arriving_time leaving_time travelTime0 a 2020/02/03 14:36:06 14:40:05 00:03:591 b 2020/04/05 08:52:02 08:52:41 00:05:122 b 2020/04/05 08:53:02 08:54:33 00:05:123 b 2020/04/05 08:55:24 08:57:14 00:05:124 b 2020/04/06 18:58:03 19:01:07 00:06:055 b 2020/04/06 19:03:05 19:04:08 00:06:056 c 2020/12/15 17:04:28 17:09:48 00:05:207 d 2020/06/23 21:31:23 21:50:12 00:18:49
IIUC we first groupby
id
& date
to get the max and min leave & arrival time.
then a simple subtraction.
df2 = df.groupby(['id','date']).agg(min_arrival=('arriving_time','min'), max_leave=('leaving_time','max'))df2['travelTime'] = pd.to_datetime(df2['max_leave']) - pd.to_datetime(df2['min_arrival']) print(df2) min_arrival max_leave travelTimeid date a 2020-02-03 14:36:06 14:40:05 00:03:59b 2020-04-05 08:52:02 08:57:14 00:05:12 2020-04-06 18:58:03 19:04:08 00:06:05c 2020-12-15 17:04:28 17:09:48 00:05:20d 2020-06-23 21:31:23 21:50:12 00:18:49
if you want this back on yout original df, you could use transform
or merge the values from the new delta onto your original :
df_new = (pd.merge(df,df2[['travelTime']],on=['date','id'],how='left') id date arriving_time leaving_time travelTime0 a 2020-02-03 14:36:06 14:40:05 00:03:591 b 2020-04-05 08:52:02 08:52:41 00:05:122 b 2020-04-05 08:53:02 08:54:33 00:05:123 b 2020-04-05 08:55:24 08:57:14 00:05:124 b 2020-04-06 18:58:03 19:01:07 00:06:055 b 2020-04-06 19:03:05 19:04:08 00:06:056 c 2020-12-15 17:04:28 17:09:48 00:05:207 d 2020-06-23 21:31:23 21:50:12 00:18:49
You could try this -
my_df['arriving_time'] = pd.to_datetime(my_df['arriving_time'])my_df['leaving_time'] = pd.to_datetime(my_df['leaving_time'])my_df['travel_time'] = my_df.groupby(['id', 'date'])['leaving_time'].transform('max') - my_df.groupby(['id', 'date'])['arriving_time'].transform('min')my_df id date arriving_time leaving_time travel_time0 a 2020/02/03 2020-03-19 14:36:06 2020-03-19 14:40:05 00:03:591 b 2020/04/05 2020-03-19 08:52:02 2020-03-19 08:52:41 00:05:122 b 2020/04/05 2020-03-19 08:53:02 2020-03-19 08:54:33 00:05:123 b 2020/04/05 2020-03-19 08:55:24 2020-03-19 08:57:14 00:05:124 b 2020/04/06 2020-03-19 18:58:03 2020-03-19 19:01:07 00:06:055 b 2020/04/06 2020-03-19 19:03:05 2020-03-19 19:04:08 00:06:056 c 2020/12/15 2020-03-19 17:04:28 2020-03-19 17:09:48 00:05:207 d 2020/06/23 2020-03-19 21:31:23 2020-03-19 21:50:12 00:18:49