Python Pandas - Using to_sql to write large data frames in chunks Python Pandas - Using to_sql to write large data frames in chunks pandas pandas

Python Pandas - Using to_sql to write large data frames in chunks


Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062

So starting from 0.15, you can specify the chunksize argument and e.g. simply do:

df.to_sql('table', engine, chunksize=20000)


There is beautiful idiomatic function chunks provided in answer to this question

In your case you can use this function like this:

def chunks(l, n):""" Yield successive n-sized chunks from l."""    for i in xrange(0, len(l), n):         yield l.iloc[i:i+n]def write_to_db(engine, frame, table_name, chunk_size):    for idx, chunk in enumerate(chunks(frame, chunk_size)):        if idx == 0:            if_exists_param = 'replace':        else:            if_exists_param = 'append'        chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)

Only drawback that it doesn't support slicing second index in iloc function.


Reading from one table and writing to other in chunks....

[myconn1 ---> Source Table],[myconn2----> Target Table],[ch= 10000]

for chunk in pd.read_sql_table(table_name=source, con=myconn1, chunksize=ch):    chunk.to_sql(name=target, con=myconn2, if_exists="replace", index=False,                 chunksize=ch)    LOGGER.info(f"Done 1 chunk")