Progress bar for pandas.DataFrame.to_sql

python sqlite pandas dataframe tqdm

Unfortuantely DataFrame.to_sql does not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:

import sqlite3import pandas as pdfrom tqdm import tqdmDB_FILENAME='/tmp/test.sqlite'def chunker(seq, size):    # from http://stackoverflow.com/a/434328    return (seq[pos:pos + size] for pos in range(0, len(seq), size))def insert_with_progress(df, dbfile):    con = sqlite3.connect(dbfile)    chunksize = int(len(df) / 10) # 10%    with tqdm(total=len(df)) as pbar:        for i, cdf in enumerate(chunker(df, chunksize)):            replace = "replace" if i == 0 else "append"            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)            pbar.update(chunksize)            df = pd.DataFrame({'a': range(0,100000)})insert_with_progress(df, DB_FILENAME)

Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.

The result is quite stunning:

python sqlite pandas dataframe tqdm

I wanted to share a variant of the solution posted by miraculixx - that I had to alter for SQLAlchemy:

#these need to be customized - myDataFrame, myDBEngine, myDBTabledf=myDataFramedef chunker(seq, size):    return (seq[pos:pos + size] for pos in range(0, len(seq), size))def insert_with_progress(df):    con = myDBEngine.connect()    chunksize = int(len(df) / 10)    with tqdm(total=len(df)) as pbar:        for i, cdf in enumerate(chunker(df, chunksize)):            replace = "replace" if i == 0 else "append"            cdf.to_sql(name="myDBTable", con=conn, if_exists="append", index=False)             pbar.update(chunksize)            tqdm._instances.clear()insert_with_progress(df)

python sqlite pandas dataframe tqdm

User miraculixx has a nice example above, thank you for that. But if you want to use it with files of all sizes you should add something like this:

chunksize = int(len(df) / 10)if chunksize == 0:    df.to_sql(con=con, name="MLS", if_exists="replace", index=False)else:    with tqdm(total=len(df)) as pbar:    ...

CodeHunter

Progress bar for pandas.DataFrame.to_sql

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last