How to convert string to datetime with nulls - python, pandas? How to convert string to datetime with nulls - python, pandas? pandas pandas

How to convert string to datetime with nulls - python, pandas?


Just use to_datetime and set errors='coerce' to handle duff data:

In [321]:df['Date'] = pd.to_datetime(df['Date'], errors='coerce')dfOut[321]:                 Date0 2014-10-20 10:44:311 2014-10-23 09:33:462                 NaT3 2014-10-01 09:38:45In [322]:df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 4 entries, 0 to 3Data columns (total 1 columns):Date    3 non-null datetime64[ns]dtypes: datetime64[ns](1)memory usage: 64.0 bytes

the problem with calling strptime is that it will raise an error if the string, or dtype is incorrect.

If you did this then it would work:

In [324]:def func(x):    try:        return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')    except:        return pd.NaTdf['Date'].apply(func)Out[324]:0   2014-10-20 10:44:311   2014-10-23 09:33:462                   NaT3   2014-10-01 09:38:45Name: Date, dtype: datetime64[ns]

but it will be faster to use the inbuilt to_datetime rather than call apply which essentially just loops over your series.

timings

In [326]:%timeit pd.to_datetime(df['Date'], errors='coerce')%timeit df['Date'].apply(func)10000 loops, best of 3: 65.8 µs per loop10000 loops, best of 3: 186 µs per loop

We see here that using to_datetime is 3X faster.


I find letting pandas do the work to be too slow on large dataframes. In another post I learned of a technique that speeds this up dramatically when the number of unique values is much smaller than the number of rows. (My data is usually stock price or trade blotter data.) It first builds a dict that maps the text dates to their datetime objects, then applies the dict to convert the column of text dates.

def str2time(val):    try:        return dt.datetime.strptime(val, '%H:%M:%S.%f')    except:        return pd.NaTdef TextTime2Time(s):    times = {t : str2time(t) for t in s.unique()}    return s.apply(lambda v: times[v])df.date = TextTime2Time(df.date)