Use python 2 shelf in python 3 Use python 2 shelf in python 3 python-3.x python-3.x

Use python 2 shelf in python 3


As I understand now, here is the path that lead to my problem:

  • The original shelf was created with Python 2 in Windows
  • Python 2 Windows defaults to bsddb as the underlying database for shelving, since dbm is not available on the Windows platform
  • Python 3 does not ship with bsddb. The underlying database is dumbdbm in Python 3 for Windows.

I at first looked into installing a third party bsddb module for Python 3, but it quickly started to turn into a hassle. It then seemed that it would be a recurring hassle any time I need to use the same shelf file on a new machine. So I decided to convert the file from bsddb to dumbdbm, which both my python 2 and python 3 installations can read.

I ran the following in Python 2, which is the version that contains both bsddb and dumbdbm:

import shelveimport dumbdbmdef dumbdbm_shelve(filename,flag="c"):    return shelve.Shelf(dumbdbm.open(filename,flag))out_shelf=dumbdbm_shelve("shelved.dumbdbm.shelf")in_shelf=shelve.open("shelved.shelf")key_list=in_shelf.keys()for key in key_list:    out_shelf[key]=in_shelf[key]out_shelf.close()in_shelf.close()

So far it looks like the dumbdbm.shelf files came out ok, pending a double-check of the contents.


The shelve module uses Python's pickle, which may require a protocol version when being accessed between different versions of Python.

Try supplying protocol version 2:

population = shelve.open('shelved.shelf', protocol=2)

According to the documentation:

Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.

This is most likely the protocol used in the original serialization (or pickling).


Edited: You may need to rename your database. Read on...

Seems like pickle is not the culprit here. shelve relies also in anydbm (Python 2.x) or dbm (Python 3) to create/open a database and store the pickled information.

I created (manually) a database file using the following:

# Python 2.7import anydbmanydbm.open('database2', flag='c')

and

# Python 3.4import dbmdbm.open('database3', flag='c')

In both cases, it creates the same kind of database (may be distribution dependent, this is on Debian 7):

$ file *database2:    Berkeley DB (Hash, version 9, native byte-order)database3.db: Berkeley DB (Hash, version 9, native byte-order)

anydbm can open database3.db without problems, as expected:

>>> anydbm.open('database3')<dbm.dbm object at 0x7fb1089900f0>

Notice the lack of .db when specifying the database name, though. But dbm chokes on database2, which is weird:

>>> dbm.open('database2')Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/usr/lib/python3.4/dbm/__init__.py", line 88, in open    raise error[0]("db type could not be determined")dbm.error: db type could not be determined

unless I change the name of the name of the database to database2.db:

$ mv database2 database2.db$ python3>>> import dbm>>> dbm.open('database2')<_dbm.dbm object at 0x7fa7eaefcf50>

So, I suspect a regression on the dbm module, but I haven't checked the documentation. It may be intended :-?

NB: Notice that in my case, the extension is .db, but that depends on the database being used by dbm by default! Create an empty shelf using Python 3 to figure out which one are you using and what is it expecting.