Keep only date part when using pandas.to_datetime
Since version 0.15.0
this can now be easily done using .dt
to access just the date component:
df['just_date'] = df['dates'].dt.date
The above returns a datetime.date
dtype, if you want to have a datetime64
then you can just normalize
the time component to midnight so it sets all the values to 00:00:00
:
df['normalised_date'] = df['dates'].dt.normalize()
This keeps the dtype
as datetime64
, but the display shows just the date
value.
While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetime
objects, and hence any operation on them will be not vectorized - that is, it will be slow).
A better performing alternative is to use df['dates'].dt.floor('d')
. Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00
. But it does work as desired by the OP when, for instance:
- printing to screen
- saving to csv
- using the column to
groupby
... and it is much more efficient, since the operation is vectorized.
EDIT: in fact, the answer the OP's would have preferred is probably "recent versions of pandas
do not write the time to csv if it is 00:00:00
for all observations".