unzipping file results in "BadZipFile: File is not a zip file"
files named file can confuse python - try naming it something else. if it STILL wont work, try this code:
def fixBadZipfile(zipFile): f = open(zipFile, 'r+b') data = f.read() pos = data.find('\x50\x4b\x05\x06') # End of central directory signature if (pos > 0): self._log("Trancating file at location " + str(pos + 22)+ ".") f.seek(pos + 22) # size of 'ZIP end of central directory record' f.truncate() f.close() else: # raise error, file is truncated
astronautlevel's solution works for most cases, but the compressed data and CRCs in the Zip can also contain the same 4 bytes. You should do an rfind
(not find
), seek to pos+20 and then add write \x00\x00
to the end of the file (tell zip applications that the length of the 'comments' section is 0 bytes long).
# HACK: See http://bugs.python.org/issue10694 # The zip file generated is correct, but because of extra data after the 'central directory' section, # Some version of python (and some zip applications) can't read the file. By removing the extra data, # we ensure that all applications can read the zip without issue. # The ZIP format: http://www.pkware.com/documents/APPNOTE/APPNOTE-6.3.0.TXT # Finding the end of the central directory: # http://stackoverflow.com/questions/8593904/how-to-find-the-position-of-central-directory-in-a-zip-file # http://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin # This second link is only losely related, but echos the first, "processing a ZIP archive often requires backwards seeking" content = zipFileContainer.read() pos = content.rfind('\x50\x4b\x05\x06') # reverse find: this string of bytes is the end of the zip's central directory. if pos>0: zipFileContainer.seek(pos+20) # +20: see secion V.I in 'ZIP format' link above. zipFileContainer.truncate() zipFileContainer.write('\x00\x00') # Zip file comment length: 0 byte length; tell zip applications to stop reading. zipFileContainer.seek(0) return zipFileContainer