Query Sqlite for multiple arguments at once and handling missing values
To perform all the work in sqlite, you could use a LEFT JOIN to fill in missing prices with None
:
sql='''SELECT p.price, t.dateFROM ( {t} ) tLEFT JOIN price pON p.date = t.dateWHERE p.id = ?'''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date))cursor.execute(sql,[id])result=cursor.fetchall()
However, this solution requires forming a (potentially) huge string in Python in order to create a temporary table of all desired dates. It is not only slow (including the time it takes sqlite to create the temporary table) it is also brittle: If len(date)
is greater than about 500, then sqlite raises
OperationalError: too many terms in compound SELECT
You might be able to get around this if you already have all the desired dates in some other table. Then you could replace the ugly "UNION ALL" SQL above with something like
SELECT p.price, t.dateFROM ( SELECT date from dates ) tLEFT JOIN price pON p.date = t.date
While this is an improvement, my timeit tests (see below) show that doing part of the work in Python is still faster:
Doing part of the work in Python:
If you know that the dates are consecutive and can therefore be expressed as a range, then:
curs.execute(''' SELECT date, price FROM prices WHERE date <= ? AND date >= ? AND id = ?''', (max(date), min(date), id))
Otherwise, if the dates are arbitrary then:
sql = ''' SELECT date, price FROM prices WHERE date IN ({s}) AND id = ?'''.format(s={','.join(['?']*len(dates))})curs.execute(sql,dates + [id])
To form the result
list with None
inserted for missing prices, you could form a dict
out of (date,price)
pairs, and use the dict.get()
method to supply the default value None
when the date
key is missing:
result = dict(curs.fetchall())result = [(result.get(d,None), d) for d in date]
Note to form the dict
as a mapping from dates to prices, I swapped the order of date
and price
in the SQL queries.
Timeit tests:
I compared these three functions:
def using_sqlite_union(): sql = ''' SELECT p.price, t.date FROM ( {t} ) t LEFT JOIN price p ON p.date = t.date '''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d)) for d in dates)) cursor.execute(sql) return cursor.fetchall()def using_sqlite_dates(): sql = ''' SELECT p.price, t.date FROM ( SELECT date from dates ) t LEFT JOIN price p ON p.date = t.date ''' cursor.execute(sql) return cursor.fetchall()def using_python_dict(): cursor.execute(''' SELECT date, price FROM price WHERE date <= ? AND date >= ? ''', (max(dates), min(dates))) result = dict(cursor.fetchall()) result = [(result.get(d,None), d) for d in dates] return resultN = 500m = 10omit = random.sample(range(N), m)dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ]rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]
rows
defined the data which was INSERTed into the price
table.
Timeit tests results:
Running timeit like this:
python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'
produced these benchmarks:
·────────────────────·────────────────────·│ using_python_dict │ 1.47 msec per loop ││ using_sqlite_dates │ 3.39 msec per loop ││ using_sqlite_union │ 5.69 msec per loop │·────────────────────·────────────────────·
using_python_dict
is about 2.3 times faster than using_sqlite_dates
. Even if we increase the total number of dates to 10000, the speed ratio remains the same:
·────────────────────·────────────────────·│ using_python_dict │ 32.5 msec per loop ││ using_sqlite_dates │ 81.5 msec per loop │·────────────────────·────────────────────·
Conclusion: shifting all the work into sqlite is not necessarily faster.