Optimal chunksize parameter in pandas.DataFrame.to_sql Optimal chunksize parameter in pandas.DataFrame.to_sql postgresql postgresql

Optimal chunksize parameter in pandas.DataFrame.to_sql


In my case, 3M rows having 5 columns were inserted in 8 mins when I used pandas to_sql function parameters as chunksize=5000 and method='multi'. This was a huge improvement as inserting 3M rows using python into the database was becoming very hard for me.


I tried something the other way around. From sql to csv and I noticed that the smaller the chunksize the quicker the job was done. Adding additional cpus to the job (multiprocessing) didn't change anything.