Progress bar for pandas.DataFrame.to_sql
Unfortuantely DataFrame.to_sql
does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:
import sqlite3import pandas as pdfrom tqdm import tqdmDB_FILENAME='/tmp/test.sqlite'def chunker(seq, size): # from http://stackoverflow.com/a/434328 return (seq[pos:pos + size] for pos in range(0, len(seq), size))def insert_with_progress(df, dbfile): con = sqlite3.connect(dbfile) chunksize = int(len(df) / 10) # 10% with tqdm(total=len(df)) as pbar: for i, cdf in enumerate(chunker(df, chunksize)): replace = "replace" if i == 0 else "append" cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False) pbar.update(chunksize) df = pd.DataFrame({'a': range(0,100000)})insert_with_progress(df, DB_FILENAME)
Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.
The result is quite stunning:
I wanted to share a variant of the solution posted by miraculixx - that I had to alter for SQLAlchemy:
#these need to be customized - myDataFrame, myDBEngine, myDBTabledf=myDataFramedef chunker(seq, size): return (seq[pos:pos + size] for pos in range(0, len(seq), size))def insert_with_progress(df): con = myDBEngine.connect() chunksize = int(len(df) / 10) with tqdm(total=len(df)) as pbar: for i, cdf in enumerate(chunker(df, chunksize)): replace = "replace" if i == 0 else "append" cdf.to_sql(name="myDBTable", con=conn, if_exists="append", index=False) pbar.update(chunksize) tqdm._instances.clear()insert_with_progress(df)
User miraculixx has a nice example above, thank you for that. But if you want to use it with files of all sizes you should add something like this:
chunksize = int(len(df) / 10)if chunksize == 0: df.to_sql(con=con, name="MLS", if_exists="replace", index=False)else: with tqdm(total=len(df)) as pbar: ...