Query Sqlite for multiple arguments at once and handling missing values Query Sqlite for multiple arguments at once and handling missing values sqlite sqlite

Query Sqlite for multiple arguments at once and handling missing values


To perform all the work in sqlite, you could use a LEFT JOIN to fill in missing prices with None:

sql='''SELECT p.price, t.dateFROM ( {t} ) tLEFT JOIN price pON p.date = t.dateWHERE p.id = ?'''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date))cursor.execute(sql,[id])result=cursor.fetchall()

However, this solution requires forming a (potentially) huge string in Python in order to create a temporary table of all desired dates. It is not only slow (including the time it takes sqlite to create the temporary table) it is also brittle: If len(date) is greater than about 500, then sqlite raises

OperationalError: too many terms in compound SELECT

You might be able to get around this if you already have all the desired dates in some other table. Then you could replace the ugly "UNION ALL" SQL above with something like

SELECT p.price, t.dateFROM ( SELECT date from dates ) tLEFT JOIN price pON p.date = t.date

While this is an improvement, my timeit tests (see below) show that doing part of the work in Python is still faster:


Doing part of the work in Python:

If you know that the dates are consecutive and can therefore be expressed as a range, then:

curs.execute('''    SELECT date, price    FROM prices    WHERE date <= ?        AND date >= ?        AND id = ?''', (max(date), min(date), id))

Otherwise, if the dates are arbitrary then:

sql = '''    SELECT date, price    FROM prices    WHERE date IN ({s})        AND id = ?'''.format(s={','.join(['?']*len(dates))})curs.execute(sql,dates + [id])

To form the result list with None inserted for missing prices, you could form a dict out of (date,price) pairs, and use the dict.get() method to supply the default value None when the date key is missing:

result = dict(curs.fetchall())result = [(result.get(d,None), d) for d in date]

Note to form the dict as a mapping from dates to prices, I swapped the order of date and price in the SQL queries.


Timeit tests:

I compared these three functions:

def using_sqlite_union():    sql = '''        SELECT p.price, t.date        FROM ( {t} ) t        LEFT JOIN price p        ON p.date = t.date    '''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d))                                      for d in dates))    cursor.execute(sql)    return cursor.fetchall()def using_sqlite_dates():    sql = '''        SELECT p.price, t.date        FROM ( SELECT date from dates ) t        LEFT JOIN price p        ON p.date = t.date    '''    cursor.execute(sql)    return cursor.fetchall()def using_python_dict():    cursor.execute('''        SELECT date, price        FROM price        WHERE date <= ?            AND date >= ?            ''', (max(dates), min(dates)))    result = dict(cursor.fetchall())    result = [(result.get(d,None), d) for d in dates]    return resultN = 500m = 10omit = random.sample(range(N), m)dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ]rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]

rows defined the data which was INSERTed into the price table.


Timeit tests results:

Running timeit like this:

python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'

produced these benchmarks:

·────────────────────·────────────────────·│  using_python_dict │ 1.47 msec per loop ││ using_sqlite_dates │ 3.39 msec per loop ││ using_sqlite_union │ 5.69 msec per loop │·────────────────────·────────────────────·

using_python_dict is about 2.3 times faster than using_sqlite_dates. Even if we increase the total number of dates to 10000, the speed ratio remains the same:

·────────────────────·────────────────────·│  using_python_dict │ 32.5 msec per loop ││ using_sqlite_dates │ 81.5 msec per loop │·────────────────────·────────────────────·

Conclusion: shifting all the work into sqlite is not necessarily faster.