Pandas to_sql gives ValueError with timezone-aware column Pandas to_sql gives ValueError with timezone-aware column postgresql postgresql

Pandas to_sql gives ValueError with timezone-aware column


You have to store it as pd.Timestamp in PostgreSQL. The code below worked for me:

times = ['201510100222', '201510110333']df = pd.DataFrame()df['time'] = pd.to_datetime(times, utc=True)df['time'] = df['time'].astype(pd.Timestamp)df.time.to_sql('test', engine, if_exists='replace', index=False)

But don't forget to properly create your database table with data type TIMESTAMP WITH TIME ZONE. If you are building your table directly from the to_sql command, you have to specify it explicitly:

from sqlalchemy.types import TIMESTAMP as typeTIMESTAMPdf.time.to_sql('test', engine, if_exists='replace', index=False,dtype=typeTIMESTAMP(timezone=True))


You can convert your datetimes to strings :

times = pd.DatetimeIndex(['201510100222', '201510110333'], tz="UTC")df = pd.DataFrame()df['time'] = [time.strftime(format="%Y-%m-%d %H:%M:%S%z") for time in times]

then insert them in the database as datetimes :

from sqlalchemy import TIMESTAMPdf.to_sql('test', engine, if_exists='replace', index=False,          dtype={'time': TIMESTAMP(timezone=True)})

It's quite an ugly solution, but on my setup, it works.

Note that postgres will display the datetimes in your current timezone. Mine is Europe/Paris, so here's what I get when I query them (psql) :

test=# select * from test;          time          ------------------------ 2015-10-10 04:22:00+02 2015-10-11 05:33:00+02(2 rows)

instead of something like

          time          ------------------------ 2015-10-10 02:22:00+00 2015-10-11 03:33:00+00


This works with pandas 0.16.2 , so you can simply downgrade pandas to avoid the error:

conda remove pandasconda install pandas=0.16.2

IN THE DB:

(1) Set timezone = 'UTC' in postgresq.conf. This makes UTC the default time zone for all connections to your DB

(2) Use timestamp with time zone (aka timestamptz) for all timestamp columns in your DB. They store values as UTC, but convert them on selection to whatever your time zone setting is.

IN PYTHON:

(3) always create timestamps with timezone in UTC:

def get_now_in_utc():    now = datetime.now(tz=pytz.utc)    return now

(4) and persist them with pandas to_sql

RESULT:

(5) this will cause your persistence to be timezone aware and accurate.

(6) when querying from the DB (use the AT TIME ZONE expression in your query) or within the code (timezone transformations within python), you can always take the UTC time and convert it to whatever you like.