Decompress bz2 files
bz2.compress/decompress work with binary data:
>>> import bz2>>> compressed = bz2.compress(b'test_string')>>> compressedb'BZh91AY&SYJ|i\x05\x00\x00\x04\x83\x80\x00\x00\x82\xa1\x1c\x00 \x00"\x03h\x840"P\xdf\x04\x99\xe2\xeeH\xa7\n\x12\tO\x8d \xa0'>>> bz2.decompress(compressed)b'test_string'
In short - you need to process file contents manually. In case you have very large files you should prefer using bz2.BZ2Decompressor
to bz2.decompress
, because the latter requires that you store the entire file in a byte array.
for filename in files: filepath = os.path.join(dirpath, filename) newfilepath = os.path.join(dirpath,filename + '.decompressed') with open(newfilepath, 'wb') as new_file, open(filepath, 'rb') as file: decompressor = BZ2Decompressor() for data in iter(lambda : file.read(100 * 1024), b''): new_file.write(decompressor.decompress(data))
You can also use bz2.BZ2File
to make this even simpler:
for filename in files: filepath = os.path.join(dirpath, filename) newfilepath = os.path.join(dirpath, filename + '.decompressed') with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file: for data in iter(lambda : file.read(100 * 1024), b''): new_file.write(data)
bz2.decompress
takes compressed data and inflates it. You pass a filename, not the data in the file!
Do this instead:
zipfile = bz2.BZ2File(filepath) # open the filedata = zipfile.read() # get the decompressed datanewfilepath = filepath[:-4] # assuming the filepath ends with .bz2open(newfilepath, 'wb').write(data) # write a uncompressed file
This should work
for file in files: archive_path = os.path.join(dirpath,filename) outfile_path = os.path.join(dirpath, filename[:-4]) with open(archive_path, 'rb') as source, open(outfile_path, 'wb') as dest: dest.write(bz2.decompress(source.read()))