python unzip -- tremendously slow? python unzip -- tremendously slow? linux linux

python unzip -- tremendously slow?


I was struggling to unzip/decompress/extract zip files with Python as well and that "create ZipFile object, loop through its .namelist(), read the files and write them to file system" low-level approach didn't seem very Python. So I started to dig zipfile objects that I believe not very well documented and covered all the object methods:

>>> from zipfile import ZipFile>>> filepath = '/srv/pydocfiles/packages/ebook.zip'>>> zip = ZipFile(filepath)>>> dir(zip)['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr'] 

There we go the "extractall" method works just like tarfile's extractall ! (on python 2.6 and 2.7 but NOT 2.5)

Then the performance concerns; the file ebook.zip is 84.6 MB (mostly pdf files) and uncompressed folder is 103 MB, zipped by default "Archive Utility" under MacOSx 10.5. So I did the same with Python's timeit module:

>>> from timeit import Timer>>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \...         extract_to = '/tmp/pydocnet/build'; \...         from zipfile import ZipFile; \...         ZipFile(filepath).extractall(path=extract_to)")>>> >>> t.timeit(1)1.8670060634613037

which took less than 2 seconds on a heavy loaded machine that has 90% of the memory is being used by other applications.

Hope this helps someone.


I don't know what code you use to unzip your file, but the following works for me: After creating a zip archive "test.zip" containing just one file "file1", the following Python script extracts "file1" from the archive:

from zipfile import ZipFile, ZIP_DEFLATEDzip = ZipFile("test.zip", mode='r', compression=ZIP_DEFLATED, allowZip64=False)data = zip.read("file1")print len(data)

This takes nearly no time: I tried a 37MB input file which compressed down to a 15MB zip archive. In this example the Python script took 0.346 seconds on my MacBook Pro. Maybe in your case the 37 seconds were taken up by something you did with the data instead?


Instead of using the python module we can use the zip featured offered by ubuntu in python. I use this because sometimes the python zip fails.

import osfilename = testos.system('7z a %s.zip %s'% (filename, filename))