Pandas: Subtracting two date columns and the result being an integer Pandas: Subtracting two date columns and the result being an integer python python

Pandas: Subtracting two date columns and the result being an integer


How about:

df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days

This will return difference as int if there are no missing values(NaT) and float if there is.

Pandas have a rich documentation on Time series / date functionality and Time deltas


You can divide column of dtype timedelta by np.timedelta64(1, 'D'), but output is not int, but float, because NaN values:

df_test['Difference'] = df_test['Difference'] / np.timedelta64(1, 'D')print (df_test)  First_Date Second Date  Difference0 2016-02-09  2015-11-19        82.01 2016-01-06  2015-11-30        37.02        NaT  2015-12-04         NaN3 2016-01-06  2015-12-08        29.04        NaT  2015-12-09         NaN5 2016-01-07  2015-12-11        27.06        NaT  2015-12-12         NaN7        NaT  2015-12-14         NaN8 2016-01-06  2015-12-14        23.09        NaT  2015-12-15         NaN

Frequency conversion.


You can use datetime module to help here. Also, as a side note, a simple date subtraction should work as below:

import datetime as dtimport numpy as npimport pandas as pd#Assume we have df_test:In [222]: df_testOut[222]:    first_date second_date0  2016-01-31  2015-11-191  2016-02-29  2015-11-202  2016-03-31  2015-11-213  2016-04-30  2015-11-224  2016-05-31  2015-11-235  2016-06-30  2015-11-246         NaT  2015-11-257         NaT  2015-11-268  2016-01-31  2015-11-279         NaT  2015-11-2810        NaT  2015-11-2911        NaT  2015-11-3012 2016-04-30  2015-12-0113        NaT  2015-12-0214        NaT  2015-12-0315 2016-04-30  2015-12-0416        NaT  2015-12-0517        NaT  2015-12-06In [223]: df_test['Difference'] = df_test['first_date'] - df_test['second_date'] In [224]: df_testOut[224]:    first_date second_date  Difference0  2016-01-31  2015-11-19     73 days1  2016-02-29  2015-11-20    101 days2  2016-03-31  2015-11-21    131 days3  2016-04-30  2015-11-22    160 days4  2016-05-31  2015-11-23    190 days5  2016-06-30  2015-11-24    219 days6         NaT  2015-11-25         NaT7         NaT  2015-11-26         NaT8  2016-01-31  2015-11-27     65 days9         NaT  2015-11-28         NaT10        NaT  2015-11-29         NaT11        NaT  2015-11-30         NaT12 2016-04-30  2015-12-01    151 days13        NaT  2015-12-02         NaT14        NaT  2015-12-03         NaT15 2016-04-30  2015-12-04    148 days16        NaT  2015-12-05         NaT17        NaT  2015-12-06         NaT

Now, change type to datetime.timedelta, and then use the .days method on valid timedelta objects.

In [226]: df_test['Diffference'] = df_test['Difference'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)In [227]: df_testOut[227]:    first_date second_date  Difference  Diffference0  2016-01-31  2015-11-19     73 days           731  2016-02-29  2015-11-20    101 days          1012  2016-03-31  2015-11-21    131 days          1313  2016-04-30  2015-11-22    160 days          1604  2016-05-31  2015-11-23    190 days          1905  2016-06-30  2015-11-24    219 days          2196         NaT  2015-11-25         NaT          NaN7         NaT  2015-11-26         NaT          NaN8  2016-01-31  2015-11-27     65 days           659         NaT  2015-11-28         NaT          NaN10        NaT  2015-11-29         NaT          NaN11        NaT  2015-11-30         NaT          NaN12 2016-04-30  2015-12-01    151 days          15113        NaT  2015-12-02         NaT          NaN14        NaT  2015-12-03         NaT          NaN15 2016-04-30  2015-12-04    148 days          14816        NaT  2015-12-05         NaT          NaN17        NaT  2015-12-06         NaT          NaN

Hope that helps.