Use python 2 shelf in python 3
As I understand now, here is the path that lead to my problem:
- The original shelf was created with Python 2 in Windows
- Python 2 Windows defaults to bsddb as the underlying database for shelving, since dbm is not available on the Windows platform
- Python 3 does not ship with bsddb. The underlying database is dumbdbm in Python 3 for Windows.
I at first looked into installing a third party bsddb module for Python 3, but it quickly started to turn into a hassle. It then seemed that it would be a recurring hassle any time I need to use the same shelf file on a new machine. So I decided to convert the file from bsddb to dumbdbm, which both my python 2 and python 3 installations can read.
I ran the following in Python 2, which is the version that contains both bsddb and dumbdbm:
import shelveimport dumbdbmdef dumbdbm_shelve(filename,flag="c"): return shelve.Shelf(dumbdbm.open(filename,flag))out_shelf=dumbdbm_shelve("shelved.dumbdbm.shelf")in_shelf=shelve.open("shelved.shelf")key_list=in_shelf.keys()for key in key_list: out_shelf[key]=in_shelf[key]out_shelf.close()in_shelf.close()
So far it looks like the dumbdbm.shelf files came out ok, pending a double-check of the contents.
The shelve
module uses Python's pickle
, which may require a protocol version when being accessed between different versions of Python.
Try supplying protocol version 2:
population = shelve.open('shelved.shelf', protocol=2)
According to the documentation:
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
This is most likely the protocol used in the original serialization (or pickling).
Edited: You may need to rename your database. Read on...
Seems like pickle
is not the culprit here. shelve
relies also in anydbm
(Python 2.x) or dbm
(Python 3) to create/open a database and store the pickled information.
I created (manually) a database file using the following:
# Python 2.7import anydbmanydbm.open('database2', flag='c')
and
# Python 3.4import dbmdbm.open('database3', flag='c')
In both cases, it creates the same kind of database (may be distribution dependent, this is on Debian 7):
$ file *database2: Berkeley DB (Hash, version 9, native byte-order)database3.db: Berkeley DB (Hash, version 9, native byte-order)
anydbm
can open database3.db
without problems, as expected:
>>> anydbm.open('database3')<dbm.dbm object at 0x7fb1089900f0>
Notice the lack of .db
when specifying the database name, though. But dbm
chokes on database2
, which is weird:
>>> dbm.open('database2')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.4/dbm/__init__.py", line 88, in open raise error[0]("db type could not be determined")dbm.error: db type could not be determined
unless I change the name of the name of the database to database2.db
:
$ mv database2 database2.db$ python3>>> import dbm>>> dbm.open('database2')<_dbm.dbm object at 0x7fa7eaefcf50>
So, I suspect a regression on the dbm
module, but I haven't checked the documentation. It may be intended :-?
NB: Notice that in my case, the extension is .db
, but that depends on the database being used by dbm
by default! Create an empty shelf using Python 3 to figure out which one are you using and what is it expecting.