dask dataframe how to convert column to to_datetime dask dataframe how to convert column to to_datetime pandas pandas

dask dataframe how to convert column to to_datetime


Use astype

You can use the astype method to convert the dtype of a series to a NumPy dtype

df.time.astype('M8[us]')

There is probably a way to specify a Pandas style dtype as well (edits welcome)

Use map_partitions and meta

When using black-box methods like map_partitions, dask.dataframe needs to know the type and names of the output. There are a few ways to do this listed in the docstring for map_partitions.

You can supply an empty Pandas object with the right dtype and name

meta = pd.Series([], name='time', dtype=pd.Timestamp)

Or you can provide a tuple of (name, dtype) for a Series or a dict for a DataFrame

meta = ('time', pd.Timestamp)

Then everything should be fine

df.time.map_partitions(pd.to_datetime, meta=meta)

If you were calling map_partitions on df instead then you would need to provide the dtypes for everything. That isn't the case in your example though.


Dask also come with to_timedelta so this should work as well.

df['time']=dd.to_datetime(df.time,unit='ns')

The values unit takes is the same as pd.to_timedelta in pandas. This can be found here.


I'm not sure if it this is the right approach, but mapping the column worked for me:

df['time'] = df['time'].map(lambda x: pd.to_datetime(x, errors='coerce'))