Returning an OFFSET subquery result in sqlite3 Returning an OFFSET subquery result in sqlite3 sqlite sqlite

Returning an OFFSET subquery result in sqlite3


I wrote the function to save the counts I mentioned at the end of the original post, so here's one possible answer for removing the duplicate index search. I would still like to know if this is doable with straight SQL.

I created a passthrough user function to capture the count from thesubquery as the offset is calculated.

So instead of the original query:

SELECT id, prev_node, next_node FROM edges WHERE prev_node = ? LIMIT 1    OFFSET abs(random())%(    SELECT count(*) FROM edges WHERE prev_node = ?);

I have something more like this:

SELECT id, prev_node, next_node FROM edges WHERE next_node = ? LIMIT 1    OFFSET abs(random())%(    cache(?, (SELECT count(*) FROM edges WHERE prev_node = ?));

The first argument to cache() is a unique identifier for that count. Icould just use the value of prev_node, but due to the application Ineed to be able to cache the counts for forward and backward walksseparately. So I'm using "$direction:$prev_node_id" as the key.

The cache function looks like this (using Python):

_cache = {}def _cache_count(self, key, count):    self._cache[key] = count    return countconn.create_function("cache", 2, self._cache_count)

And then in the random walk function, I can cons up the hash key andcheck whether the count is already known. If it is, I use a variant ofthe main query that doesn't include the subquery:

uncached = "SELECT id, next_node, prev_node " \    "FROM edges WHERE prev_node = :last LIMIT 1 " \    "OFFSET abs(random())%cache(:key, " \    "    (SELECT count(*) FROM edges WHERE prev_node = :last))"cached = "SELECT id, next_node, prev_node, has_space, count " \    "FROM edges WHERE prev_node = :last LIMIT 1 " \    "OFFSET abs(random())%:count"key = "%s:%s" % (direction, last_node)if key in cache:    count = cache[key]    query = cached    args = dict(last=last_node, count=count)else:    query = uncached    args = dict(last=last_node, key=key)row = c.execute(query, args).fetchone()

The cached queries run about twice as fast as the uncached on average (5.7usvs. 10.9us).