Calculate Time Difference Between Two Pandas Columns in Hours and Minutes
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
import pandasdf = pandas.DataFrame(columns=['to','fr','ans'])df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')](df.fr-df.to).astype('timedelta64[h]')
to yield,
0 581 32 8dtype: float64
This was driving me bonkers as the .astype()
solution above didn't work for me. But I found another way. Haven't timed it or anything, but might work for others out there:
t1 = pd.to_datetime('1/1/2015 01:00')t2 = pd.to_datetime('1/1/2015 03:30')print pd.Timedelta(t2 - t1).seconds / 3600.0
...if you want hours. Or:
print pd.Timedelta(t2 - t1).seconds / 60.0
...if you want minutes.
UPDATE: There used to be a helpful comment here that mentioned using .total_seconds()
for time periods spanning multiple days. Since it's gone, I've updated the answer.
- How do I convert my results to only hours and minutes
- The accepted answer only returns
days + hours
. Minutes are not included.
- The accepted answer only returns
- To provide a column that has hours and minutes as
hh:mm
orx hours y minutes
, would require additional calculations and string formatting. - This answer shows how to get either total hours or total minutes as a float, using
timedelta
math, and is faster than using.astype('timedelta64[h]')
- Pandas Time Deltas User Guide
- Pandas Time series / date functionality User Guide
- python
timedelta
objects: See supported operations. - The following sample data is already a
datetime64[ns] dtype
. It is required that all relevant columns are converted usingpandas.to_datetime()
.
import pandas as pd# test data from OP, with values already in a datetime formatdata = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if neededdf = pd.DataFrame(data)# add a timedelta column if wanted. It's added here for information only# df['time_delta_with_sub'] = df.from_date.sub(df.to_date) # also worksdf['time_delta'] = (df.from_date - df.to_date)# create a column with timedelta as total hours, as a float typedf['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)# create a colume with timedelta as total minutes, as a float typedf['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)# display(df) to_date from_date time_delta tot_hour_diff tot_mins_diff0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000 58.636061 3518.1636671 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000 3.684528 221.0716672 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000 8.714933 522.896000
Other methods
- An item of note from the podcast in Other Resources,
.total_seconds()
was added and merged when the core developer was on vacation, and would not have been approved.- This is also why there aren't other
.total_xx
methods.
- This is also why there aren't other
# convert the entire timedelta to seconds# this is the same as td / timedelta(seconds=1)(df.from_date - df.to_date).dt.total_seconds()[out]:0 211089.821 13264.302 31373.76dtype: float64# get the number of days(df.from_date - df.to_date).dt.days[out]:0 21 02 0dtype: int64# get the seconds for hours + minutes + seconds, but not days# note the difference from total_seconds(df.from_date - df.to_date).dt.seconds[out]:0 382891 132642 31373dtype: int64
Other Resources
- Talk Python to Me: Episode #271: Unlock the mysteries of time, Python's datetime that is!
- Timedelta begins at 31 minutes
- As per Python core developer Paul Ganssle and python
dateutil
maintainer:- Use
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
- Don't use
(df.from_date - df.to_date).dt.total_seconds() / 3600
- Use
- Real Python: Using Python datetime to Work With Dates and Times
- The
dateutil
module provides powerful extensions to the standarddatetime
module.
%%timeit
test
import pandas as pd# dataframe with 2M rowsdata = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}df = pd.DataFrame(data)df = pd.concat([df] * 1000000).reset_index(drop=True)%%timeit(df.from_date - df.to_date) / pd.Timedelta(hours=1)[out]:43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%%timeit(df.from_date - df.to_date).astype('timedelta64[h]')[out]:59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)