Calculating row-wise time difference in python Calculating row-wise time difference in python pandas pandas

Calculating row-wise time difference in python


I think for correct maximal and minimal values are converted columns to datetimes and then subtract Series created by GroupBy.transform:

my_df['s'] = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])my_df['e'] = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])g = my_df.groupby(['id', 'date'])my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))print (my_df)  id        date arriving_time leaving_time                   s  \0  a  2020/02/03      14:36:06     14:40:05 2020-02-03 14:36:06   1  b  2020/04/05      08:52:02     08:52:41 2020-04-05 08:52:02   2  b  2020/04/05      08:53:02     08:54:33 2020-04-05 08:53:02   3  b  2020/04/05      08:55:24     08:57:14 2020-04-05 08:55:24   4  b  2020/04/06      18:58:03     19:01:07 2020-04-06 18:58:03   5  b  2020/04/06      19:03:05     19:04:08 2020-04-06 19:03:05   6  c  2020/12/15      17:04:28     17:09:48 2020-12-15 17:04:28   7  d  2020/06/23      21:31:23     21:50:12 2020-06-23 21:31:23                       e travelTime  0 2020-02-03 14:40:05   00:03:59  1 2020-04-05 08:52:41   00:05:12  2 2020-04-05 08:54:33   00:05:12  3 2020-04-05 08:57:14   00:05:12  4 2020-04-06 19:01:07   00:06:05  5 2020-04-06 19:04:08   00:06:05  6 2020-12-15 17:09:48   00:05:20  7 2020-06-23 21:50:12   00:18:49  

For avoid new columns is possible use DataFrame.assign Series with datetimes:

s = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])e = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])g = my_df.assign(s=s, e=e).groupby(['id', 'date'])my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))print (my_df)  id        date arriving_time leaving_time travelTime0  a  2020/02/03      14:36:06     14:40:05   00:03:591  b  2020/04/05      08:52:02     08:52:41   00:05:122  b  2020/04/05      08:53:02     08:54:33   00:05:123  b  2020/04/05      08:55:24     08:57:14   00:05:124  b  2020/04/06      18:58:03     19:01:07   00:06:055  b  2020/04/06      19:03:05     19:04:08   00:06:056  c  2020/12/15      17:04:28     17:09:48   00:05:207  d  2020/06/23      21:31:23     21:50:12   00:18:49


IIUC we first groupby id & date to get the max and min leave & arrival time.

then a simple subtraction.

df2 = df.groupby(['id','date']).agg(min_arrival=('arriving_time','min'),                             max_leave=('leaving_time','max'))df2['travelTime'] =  pd.to_datetime(df2['max_leave']) - pd.to_datetime(df2['min_arrival']) print(df2)              min_arrival max_leave travelTimeid date                                       a  2020-02-03    14:36:06  14:40:05   00:03:59b  2020-04-05    08:52:02  08:57:14   00:05:12   2020-04-06    18:58:03  19:04:08   00:06:05c  2020-12-15    17:04:28  17:09:48   00:05:20d  2020-06-23    21:31:23  21:50:12   00:18:49

if you want this back on yout original df, you could use transform or merge the values from the new delta onto your original :

df_new = (pd.merge(df,df2[['travelTime']],on=['date','id'],how='left')  id       date arriving_time leaving_time   travelTime0  a 2020-02-03      14:36:06     14:40:05     00:03:591  b 2020-04-05      08:52:02     08:52:41     00:05:122  b 2020-04-05      08:53:02     08:54:33     00:05:123  b 2020-04-05      08:55:24     08:57:14     00:05:124  b 2020-04-06      18:58:03     19:01:07     00:06:055  b 2020-04-06      19:03:05     19:04:08     00:06:056  c 2020-12-15      17:04:28     17:09:48     00:05:207  d 2020-06-23      21:31:23     21:50:12     00:18:49


You could try this -

my_df['arriving_time'] = pd.to_datetime(my_df['arriving_time'])my_df['leaving_time'] = pd.to_datetime(my_df['leaving_time'])my_df['travel_time'] = my_df.groupby(['id', 'date'])['leaving_time'].transform('max') - my_df.groupby(['id', 'date'])['arriving_time'].transform('min')my_df    id        date       arriving_time        leaving_time travel_time0  a  2020/02/03 2020-03-19 14:36:06 2020-03-19 14:40:05    00:03:591  b  2020/04/05 2020-03-19 08:52:02 2020-03-19 08:52:41    00:05:122  b  2020/04/05 2020-03-19 08:53:02 2020-03-19 08:54:33    00:05:123  b  2020/04/05 2020-03-19 08:55:24 2020-03-19 08:57:14    00:05:124  b  2020/04/06 2020-03-19 18:58:03 2020-03-19 19:01:07    00:06:055  b  2020/04/06 2020-03-19 19:03:05 2020-03-19 19:04:08    00:06:056  c  2020/12/15 2020-03-19 17:04:28 2020-03-19 17:09:48    00:05:207  d  2020/06/23 2020-03-19 21:31:23 2020-03-19 21:50:12    00:18:49