Pandas DatetimeIndex from MongoDB ISODate
I was able to reproduce the error with the following data:
idx0 = pd.date_range('2011-11-11', periods=4)idx1 = idx0.tz_localize(tz.tzutc())idx2 = idx1.tz_convert(tz.tzlocal())df = pd.DataFrame([1, 2, 3, 4])df.groupby(idx2).sum()Out[20]: 01970-01-01 00:00:00-05:00 92011-11-10 19:00:00-05:00 1
It's a bug deep in the pandas code, related exclusively to tz.tzlocal()
. It manifests itself also in:
idx2.tz_localize(None)Out[27]: DatetimeIndex(['2011-11-10 19:00:00', '1970-01-01 00:00:00', '1970-01-01 00:00:00', '1970-01-01 00:00:00'], dtype='datetime64[ns]', freq='D')
You can use any of the following solutions:
use explicitly your timezone as a string:
idx2 = idx1.tz_convert(tz='Europe/Dublin')df.groupby(idx2).sum()Out[29]: 02011-11-11 00:00:00+00:00 12011-11-12 00:00:00+00:00 22011-11-13 00:00:00+00:00 32011-11-14 00:00:00+00:00 4
or if it doesn't work:
idx2 = idx1.tz_convert(tz.gettz('Europe/Dublin'))
convert it to an object:
df.groupby(idx2.astype(object)).sum()Out[32]: 02011-11-10 19:00:00-05:00 12011-11-11 19:00:00-05:00 22011-11-12 19:00:00-05:00 32011-11-13 19:00:00-05:00 4
Basically, converting to anything else than DatetimeIndex with tz=tz.local()
should work.
EDIT: This bug has been just fixed on pandas github. The fix will be available in pandas 0.19 release.
I have managed to get around this for now by changing my groupby
to the following
frame.groupby([pd.DatetimeIndex([x.date() for x in frame.index])]).sum()
so where I was originally trying to groupby
idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D')idx = idx.tz_localize(tz=tz.tzutc())idx = idx.tz_convert(tz=tz.tzlocal())frame.groupby(idx).sum()
I am now calling the date
method on each element of the index prior to performing the groupby
operation.
I'm posting this as an answer in case nobody replies, but I am hoping for someone to answer and explain what is happening, as my 'solution' seems too hacky for my tastes.