Parallelizing pandas pyodbc SQL database calls Parallelizing pandas pyodbc SQL database calls multithreading multithreading

Parallelizing pandas pyodbc SQL database calls


Yes, this should work, although with the caveat that you'll need to change parallel_connection.py in that talk that you site. In that code there's a fetchall function which executes each of the cursors in parallel, then combines the results. This is the core of what you'll change:

Old Code:

def fetchall(self):    results = [None] * len(self.cursors)    def do_work(index, cursor):        results[index] = cursor.fetchall()    self._do_parallel(do_work)    return list(chain(*[rs for rs in results]))

New Code:

def fetchall(self):    results = [None] * len(self.sql_connections)    def do_work(index, sql_connection):        sql, conn = sql_connection  #  Store tuple of sql/conn instead of cursor        results[index] = pd.read_sql(sql, conn)    self._do_parallel(do_work)    return pd.DataFrame().append([rs for rs in results])

Repo: https://github.com/godatadriven/ParallelConnection