Python Pandas - Using to_sql to write large data frames in chunks
Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062
So starting from 0.15, you can specify the chunksize
argument and e.g. simply do:
df.to_sql('table', engine, chunksize=20000)
There is beautiful idiomatic function chunks provided in answer to this question
In your case you can use this function like this:
def chunks(l, n):""" Yield successive n-sized chunks from l.""" for i in xrange(0, len(l), n): yield l.iloc[i:i+n]def write_to_db(engine, frame, table_name, chunk_size): for idx, chunk in enumerate(chunks(frame, chunk_size)): if idx == 0: if_exists_param = 'replace': else: if_exists_param = 'append' chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)
Only drawback that it doesn't support slicing second index in iloc function.
Reading from one table and writing to other in chunks....
[myconn1 ---> Source Table],[myconn2----> Target Table],[ch= 10000]
for chunk in pd.read_sql_table(table_name=source, con=myconn1, chunksize=ch): chunk.to_sql(name=target, con=myconn2, if_exists="replace", index=False, chunksize=ch) LOGGER.info(f"Done 1 chunk")