Pandas: Subtracting two date columns and the result being an integer
How about:
df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days
This will return difference as int
if there are no missing values(NaT
) and float
if there is.
Pandas have a rich documentation on Time series / date functionality and Time deltas
You can divide column of dtype
timedelta
by np.timedelta64(1, 'D')
, but output is not int
, but float
, because NaN
values:
df_test['Difference'] = df_test['Difference'] / np.timedelta64(1, 'D')print (df_test) First_Date Second Date Difference0 2016-02-09 2015-11-19 82.01 2016-01-06 2015-11-30 37.02 NaT 2015-12-04 NaN3 2016-01-06 2015-12-08 29.04 NaT 2015-12-09 NaN5 2016-01-07 2015-12-11 27.06 NaT 2015-12-12 NaN7 NaT 2015-12-14 NaN8 2016-01-06 2015-12-14 23.09 NaT 2015-12-15 NaN
You can use datetime module to help here. Also, as a side note, a simple date subtraction should work as below:
import datetime as dtimport numpy as npimport pandas as pd#Assume we have df_test:In [222]: df_testOut[222]: first_date second_date0 2016-01-31 2015-11-191 2016-02-29 2015-11-202 2016-03-31 2015-11-213 2016-04-30 2015-11-224 2016-05-31 2015-11-235 2016-06-30 2015-11-246 NaT 2015-11-257 NaT 2015-11-268 2016-01-31 2015-11-279 NaT 2015-11-2810 NaT 2015-11-2911 NaT 2015-11-3012 2016-04-30 2015-12-0113 NaT 2015-12-0214 NaT 2015-12-0315 2016-04-30 2015-12-0416 NaT 2015-12-0517 NaT 2015-12-06In [223]: df_test['Difference'] = df_test['first_date'] - df_test['second_date'] In [224]: df_testOut[224]: first_date second_date Difference0 2016-01-31 2015-11-19 73 days1 2016-02-29 2015-11-20 101 days2 2016-03-31 2015-11-21 131 days3 2016-04-30 2015-11-22 160 days4 2016-05-31 2015-11-23 190 days5 2016-06-30 2015-11-24 219 days6 NaT 2015-11-25 NaT7 NaT 2015-11-26 NaT8 2016-01-31 2015-11-27 65 days9 NaT 2015-11-28 NaT10 NaT 2015-11-29 NaT11 NaT 2015-11-30 NaT12 2016-04-30 2015-12-01 151 days13 NaT 2015-12-02 NaT14 NaT 2015-12-03 NaT15 2016-04-30 2015-12-04 148 days16 NaT 2015-12-05 NaT17 NaT 2015-12-06 NaT
Now, change type to datetime.timedelta, and then use the .days method on valid timedelta objects.
In [226]: df_test['Diffference'] = df_test['Difference'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)In [227]: df_testOut[227]: first_date second_date Difference Diffference0 2016-01-31 2015-11-19 73 days 731 2016-02-29 2015-11-20 101 days 1012 2016-03-31 2015-11-21 131 days 1313 2016-04-30 2015-11-22 160 days 1604 2016-05-31 2015-11-23 190 days 1905 2016-06-30 2015-11-24 219 days 2196 NaT 2015-11-25 NaT NaN7 NaT 2015-11-26 NaT NaN8 2016-01-31 2015-11-27 65 days 659 NaT 2015-11-28 NaT NaN10 NaT 2015-11-29 NaT NaN11 NaT 2015-11-30 NaT NaN12 2016-04-30 2015-12-01 151 days 15113 NaT 2015-12-02 NaT NaN14 NaT 2015-12-03 NaT NaN15 2016-04-30 2015-12-04 148 days 14816 NaT 2015-12-05 NaT NaN17 NaT 2015-12-06 NaT NaN
Hope that helps.