Parallelizing pandas pyodbc SQL database calls
Yes, this should work, although with the caveat that you'll need to change parallel_connection.py in that talk that you site. In that code there's a fetchall
function which executes each of the cursors in parallel, then combines the results. This is the core of what you'll change:
Old Code:
def fetchall(self): results = [None] * len(self.cursors) def do_work(index, cursor): results[index] = cursor.fetchall() self._do_parallel(do_work) return list(chain(*[rs for rs in results]))
New Code:
def fetchall(self): results = [None] * len(self.sql_connections) def do_work(index, sql_connection): sql, conn = sql_connection # Store tuple of sql/conn instead of cursor results[index] = pd.read_sql(sql, conn) self._do_parallel(do_work) return pd.DataFrame().append([rs for rs in results])