Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone python python

Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone


To answer my own question, this functionality has been added to pandas in the meantime. Starting from pandas 0.15.0, you can use tz_localize(None) to remove the timezone resulting in local time.
See the whatsnew entry: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements

So with my example from above:

In [4]: t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H',                          tz= "Europe/Brussels")In [5]: tOut[5]: DatetimeIndex(['2013-05-18 12:00:00+02:00', '2013-05-18 13:00:00+02:00'],                       dtype='datetime64[ns, Europe/Brussels]', freq='H')

using tz_localize(None) removes the timezone information resulting in naive local time:

In [6]: t.tz_localize(None)Out[6]: DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'],                       dtype='datetime64[ns]', freq='H')

Further, you can also use tz_convert(None) to remove the timezone information but converting to UTC, so yielding naive UTC time:

In [7]: t.tz_convert(None)Out[7]: DatetimeIndex(['2013-05-18 10:00:00', '2013-05-18 11:00:00'],                       dtype='datetime64[ns]', freq='H')

This is much more performant than the datetime.replace solution:

In [31]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10000, freq='H',                           tz="Europe/Brussels")In [32]: %timeit t.tz_localize(None)1000 loops, best of 3: 233 µs per loopIn [33]: %timeit pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])10 loops, best of 3: 99.7 ms per loop


Because I always struggle to remember, a quick summary of what each of these do:

>>> pd.Timestamp.now()  # naive local timeTimestamp('2019-10-07 10:30:19.428748')>>> pd.Timestamp.utcnow()  # tz aware UTCTimestamp('2019-10-07 08:30:19.428748+0000', tz='UTC')>>> pd.Timestamp.now(tz='Europe/Brussels')  # tz aware local timeTimestamp('2019-10-07 10:30:19.428748+0200', tz='Europe/Brussels')>>> pd.Timestamp.now(tz='Europe/Brussels').tz_localize(None)  # naive local timeTimestamp('2019-10-07 10:30:19.428748')>>> pd.Timestamp.now(tz='Europe/Brussels').tz_convert(None)  # naive UTCTimestamp('2019-10-07 08:30:19.428748')>>> pd.Timestamp.utcnow().tz_localize(None)  # naive UTCTimestamp('2019-10-07 08:30:19.428748')>>> pd.Timestamp.utcnow().tz_convert(None)  # naive UTCTimestamp('2019-10-07 08:30:19.428748')


I think you can't achieve what you want in a more efficient manner than you proposed.

The underlying problem is that the timestamps (as you seem aware) are made up of two parts. The data that represents the UTC time, and the timezone, tz_info. The timezone information is used only for display purposes when printing the timezone to the screen. At display time, the data is offset appropriately and +01:00 (or similar) is added to the string. Stripping off the tz_info value (using tz_convert(tz=None)) doesn't doesn't actually change the data that represents the naive part of the timestamp.

So, the only way to do what you want is to modify the underlying data (pandas doesn't allow this... DatetimeIndex are immutable -- see the help on DatetimeIndex), or to create a new set of timestamp objects and wrap them in a new DatetimeIndex. Your solution does the latter:

pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])

For reference, here is the replace method of Timestamp (see tslib.pyx):

def replace(self, **kwds):    return Timestamp(datetime.replace(self, **kwds),                     offset=self.offset)

You can refer to the docs on datetime.datetime to see that datetime.datetime.replace also creates a new object.

If you can, your best bet for efficiency is to modify the source of the data so that it (incorrectly) reports the timestamps without their timezone. You mentioned:

I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on)

I'd be curious what extra hassle you are referring to. I recommend as a general rule for all software development, keep your timestamp 'naive values' in UTC. There is little worse than looking at two different int64 values wondering which timezone they belong to. If you always, always, always use UTC for the internal storage, then you will avoid countless headaches. My mantra is Timezones are for human I/O only.