Generate SQL statements from a Pandas Dataframe
If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema
function of the pandas.io.sql module:
In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data')CREATE TABLE "data" ( "index" TIMESTAMP, "A" REAL, "B" REAL, "C" REAL, "D" REAL)
Some notes:
- I had to use
reset_index
because it otherwise didn't include the index - If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
SOURCE = dfTARGET = data
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
def SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):# SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET)# SOURCE: source dataframe# TARGET: target table to be created in database import pandas as pd sql_text = pd.io.sql.get_schema(SOURCE.reset_index(), TARGET) return sql_text
Check the SQL CREATE TABLE
Statement String
print('\n\n'.join(sql_text))
GENERATE SQL INSERT STATEMENT FROM DATAFRAME
def SQL_INSERT_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET): sql_texts = [] for index, row in SOURCE.iterrows(): sql_texts.append('INSERT INTO '+TARGET+' ('+ str(', '.join(SOURCE.columns))+ ') VALUES '+ str(tuple(row.values))) return sql_texts
Check the SQL INSERT INTO
Statement String
print('\n\n'.join(sql_texts))
Insert Statement Solution
Not sure if this is the absolute best way to do it but this is more efficient than using df.iterrows()
as that is very slow. Also this takes care of nan
values with the help of regular expressions.
def get_insert_query_from_df(df, dest_table): insert = """ INSERT INTO `{dest_table}` ( """.format(dest_table=dest_table) columns_string = str(list(df.columns))[1:-1] columns_string = re.sub(r' ', '\n ', columns_string) columns_string = re.sub(r'\'', '', columns_string) values_string = '' for row in df.itertuples(index=False,name=None): values_string += re.sub(r'nan', 'null', str(row)) values_string += ',\n' return insert + columns_string + ')\n VALUES\n' + values_string[:-2] + ';'