Progress bar for pandas.DataFrame.to_sql Progress bar for pandas.DataFrame.to_sql pandas pandas

Progress bar for pandas.DataFrame.to_sql


Unfortuantely DataFrame.to_sql does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:

import sqlite3import pandas as pdfrom tqdm import tqdmDB_FILENAME='/tmp/test.sqlite'def chunker(seq, size):    # from http://stackoverflow.com/a/434328    return (seq[pos:pos + size] for pos in range(0, len(seq), size))def insert_with_progress(df, dbfile):    con = sqlite3.connect(dbfile)    chunksize = int(len(df) / 10) # 10%    with tqdm(total=len(df)) as pbar:        for i, cdf in enumerate(chunker(df, chunksize)):            replace = "replace" if i == 0 else "append"            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)            pbar.update(chunksize)            df = pd.DataFrame({'a': range(0,100000)})insert_with_progress(df, DB_FILENAME)

Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.

The result is quite stunning:

enter image description here


I wanted to share a variant of the solution posted by miraculixx - that I had to alter for SQLAlchemy:

#these need to be customized - myDataFrame, myDBEngine, myDBTabledf=myDataFramedef chunker(seq, size):    return (seq[pos:pos + size] for pos in range(0, len(seq), size))def insert_with_progress(df):    con = myDBEngine.connect()    chunksize = int(len(df) / 10)    with tqdm(total=len(df)) as pbar:        for i, cdf in enumerate(chunker(df, chunksize)):            replace = "replace" if i == 0 else "append"            cdf.to_sql(name="myDBTable", con=conn, if_exists="append", index=False)             pbar.update(chunksize)            tqdm._instances.clear()insert_with_progress(df)


User miraculixx has a nice example above, thank you for that. But if you want to use it with files of all sizes you should add something like this:

chunksize = int(len(df) / 10)if chunksize == 0:    df.to_sql(con=con, name="MLS", if_exists="replace", index=False)else:    with tqdm(total=len(df)) as pbar:    ...