Generate SQL statements from a Pandas Dataframe Generate SQL statements from a Pandas Dataframe pandas pandas

Generate SQL statements from a Pandas Dataframe


If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema function of the pandas.io.sql module:

In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data')CREATE TABLE "data" (  "index" TIMESTAMP,  "A" REAL,  "B" REAL,  "C" REAL,  "D" REAL)

Some notes:

  • I had to use reset_index because it otherwise didn't include the index
  • If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).


GENERATE SQL CREATE STATEMENT FROM DATAFRAME

SOURCE = dfTARGET = data

GENERATE SQL CREATE STATEMENT FROM DATAFRAME

def SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):# SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET)# SOURCE: source dataframe# TARGET: target table to be created in database    import pandas as pd    sql_text = pd.io.sql.get_schema(SOURCE.reset_index(), TARGET)       return sql_text

Check the SQL CREATE TABLE Statement String

print('\n\n'.join(sql_text))

GENERATE SQL INSERT STATEMENT FROM DATAFRAME

def SQL_INSERT_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):    sql_texts = []    for index, row in SOURCE.iterrows():               sql_texts.append('INSERT INTO '+TARGET+' ('+ str(', '.join(SOURCE.columns))+ ') VALUES '+ str(tuple(row.values)))            return sql_texts

Check the SQL INSERT INTO Statement String

print('\n\n'.join(sql_texts))


Insert Statement Solution

Not sure if this is the absolute best way to do it but this is more efficient than using df.iterrows() as that is very slow. Also this takes care of nan values with the help of regular expressions.

def get_insert_query_from_df(df, dest_table):    insert = """    INSERT INTO `{dest_table}` (        """.format(dest_table=dest_table)    columns_string = str(list(df.columns))[1:-1]    columns_string = re.sub(r' ', '\n        ', columns_string)    columns_string = re.sub(r'\'', '', columns_string)    values_string = ''    for row in df.itertuples(index=False,name=None):        values_string += re.sub(r'nan', 'null', str(row))        values_string += ',\n'    return insert + columns_string + ')\n     VALUES\n' + values_string[:-2] + ';'