Data structure for finding nearby keys with similar bitvalues Data structure for finding nearby keys with similar bitvalues database database

Data structure for finding nearby keys with similar bitvalues


What you want is a BK-Tree. It's a tree that's ideally suited to indexing metric spaces (your problem is one), and supports both nearest-neighbour and distance queries. I wrote an article about it a while ago.

BK-Trees are generally described with reference to text and using levenshtein distance to build the tree, but it's straightforward to write one in terms of binary strings and hamming distance.


This sounds like a good fit for an S-Tree, which is like a hierarchical inverted file. Good resources on this topic include the following papers:

Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes.

Improved Methods for Signature-Tree Construction (2000)

Quote from the first one:

The hierarchical bitmap index efficiently supports dif- ferent classes of queries, including subset, superset and similarity queries. Our experiments show that the hierarchical bitmap index outperforms other set indexing techniques significantly.

These papers include references to other research that you might find useful, such as M-Trees.


Create a binary tree (specifically a trie) representing each key in your start set in the following way: The root node is the empty word, moving down the tree to the left appends a 0 and moving down the right appends a 1. The tree will only have as many leaves as your start set has elements, so the size should stay manageable.

Now you can do a recursive traversal of this tree, allowing at most n "deviations" from the query key in each recursive line of execution, until you have found all of the nodes in the start set which are within that number of deviations.