Returning an OFFSET subquery result in sqlite3
I wrote the function to save the counts I mentioned at the end of the original post, so here's one possible answer for removing the duplicate index search. I would still like to know if this is doable with straight SQL.
I created a passthrough user function to capture the count from thesubquery as the offset is calculated.
So instead of the original query:
SELECT id, prev_node, next_node FROM edges WHERE prev_node = ? LIMIT 1 OFFSET abs(random())%( SELECT count(*) FROM edges WHERE prev_node = ?);
I have something more like this:
SELECT id, prev_node, next_node FROM edges WHERE next_node = ? LIMIT 1 OFFSET abs(random())%( cache(?, (SELECT count(*) FROM edges WHERE prev_node = ?));
The first argument to cache() is a unique identifier for that count. Icould just use the value of prev_node, but due to the application Ineed to be able to cache the counts for forward and backward walksseparately. So I'm using "$direction:$prev_node_id" as the key.
The cache function looks like this (using Python):
_cache = {}def _cache_count(self, key, count): self._cache[key] = count return countconn.create_function("cache", 2, self._cache_count)
And then in the random walk function, I can cons up the hash key andcheck whether the count is already known. If it is, I use a variant ofthe main query that doesn't include the subquery:
uncached = "SELECT id, next_node, prev_node " \ "FROM edges WHERE prev_node = :last LIMIT 1 " \ "OFFSET abs(random())%cache(:key, " \ " (SELECT count(*) FROM edges WHERE prev_node = :last))"cached = "SELECT id, next_node, prev_node, has_space, count " \ "FROM edges WHERE prev_node = :last LIMIT 1 " \ "OFFSET abs(random())%:count"key = "%s:%s" % (direction, last_node)if key in cache: count = cache[key] query = cached args = dict(last=last_node, count=count)else: query = uncached args = dict(last=last_node, key=key)row = c.execute(query, args).fetchone()
The cached queries run about twice as fast as the uncached on average (5.7usvs. 10.9us).