peewee.OperationalError: too many SQL variables on upsert of only 150 rows * 8 columns
After some investigation, the problem appears to be related with the maximum number of parameters that a sql query can have: SQLITE_MAX_VARIABLE_NUMBER.
To be able to do big bulk inserts I first estimate SQLITE_MAX_VARIABLE_NUMBER and then use it to create chunks in the list of dictionaries I want to insert.
To estimate the value I use this function inspired by this answer:
def max_sql_variables(): """Get the maximum number of arguments allowed in a query by the current sqlite3 implementation. Based on `this question `_ Returns ------- int inferred SQLITE_MAX_VARIABLE_NUMBER """ import sqlite3 db = sqlite3.connect(':memory:') cur = db.cursor() cur.execute('CREATE TABLE t (test)') low, high = 0, 100000 while (high - 1) > low: guess = (high + low) // 2 query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in range(guess)]) args = [str(i) for i in range(guess)] try: cur.execute(query, args) except sqlite3.OperationalError as e: if "too many SQL variables" in str(e): high = guess else: raise else: low = guess cur.close() db.close() return lowSQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()
Then I use the above variable to slice the data
with database.atomic() as txn: size = (SQLITE_MAX_VARIABLE_NUMBER // len(data[0])) -1 # remove one to avoid issue if peewee adds some variable for i in range(0, len(data), size): table.insert_many(data[i:i+size]).upsert().execute()
An update about execution speed of max_sql_variables
.
On a 3 years old Intel machine with 4 cores and 4 Gb of RAM, running OpenSUSE tumbleweed, with SQLITE_MAX_VARIABLE_NUMBER set to 999, the function runs in less that 100ms. If I set high = 1000000
, the execution time becomes of the order of 300ms.
On a younger Intel machine with 8 cores and 8Gb of RAM, running Kubuntu, with SQLITE_MAX_VARIABLE_NUMBER set to 250000, the function runs in about 2.6 seconds and returns 99999. If I set high = 1000000
, the execution time becomes of the order of 4.5 seconds.
Looking here, https://www.sqlite.org/limits.html#max_column it seems the limit should be 2000:
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper bound on:
- ... snip ...
- The number of values in an INSERT statement
I guess you're bumping against the limit somehow? At any rate, just chunk your input or re-compile SQLite with higher limits.