*large* python dictionary with persistence storage for quick look-ups *large* python dictionary with persistence storage for quick look-ups python python

*large* python dictionary with persistence storage for quick look-ups


If you want to persist a large dictionary, you are basically looking at a database.

Python comes with built in support for sqlite3, which gives you an easy database solution backed by a file on disk.


In principle the shelve module does exactly what you want. It provides a persistent dictionary backed by a database file. Keys must be strings, but shelve will take care of pickling/unpickling values. The type of db file can vary, but it can be a Berkeley DB hash, which is an excellent light weight key-value database.

Your data size sounds huge so you must do some testing, but shelve/BDB is probably up to it.

Note: The bsddb module has been deprecated. Possibly shelve will not support BDB hashes in future.


No one has mentioned dbm. It is opened like a file, behaves like a dictionary and is in the standard distribution.

From the docs https://docs.python.org/3/library/dbm.html

import dbm# Open database, creating it if necessary.with dbm.open('cache', 'c') as db:    # Record some values    db[b'hello'] = b'there'    db['www.python.org'] = 'Python Website'    db['www.cnn.com'] = 'Cable News Network'    # Note that the keys are considered bytes now.    assert db[b'www.python.org'] == b'Python Website'    # Notice how the value is now in bytes.    assert db['www.cnn.com'] == b'Cable News Network'    # Often-used methods of the dict interface work too.    print(db.get('python.org', b'not present'))    # Storing a non-string key or value will raise an exception (most    # likely a TypeError).    db['www.yahoo.com'] = 4# db is automatically closed when leaving the with statement.

I would try this before any of the more exotic forms, and using shelve/pickle will pull everything into memory on loading.

Cheers

Tim