Python ungzipping stream of bytes?
Yes, you can use the zlib
module to decompress byte streams:
import zlibdef stream_gzip_decompress(stream): dec = zlib.decompressobj(32 + zlib.MAX_WBITS) # offset 32 to skip the header for chunk in stream: rv = dec.decompress(chunk) if rv: yield rv
The offset of 32 signals to the zlib
header that the gzip header is expected but skipped.
The S3 key object is an iterator, so you can do:
for data in stream_gzip_decompress(k): # do something with the decompressed data
I had to do the same thing and this is how I did it:
import gzipf = StringIO.StringIO()k.get_file(f)f.seek(0) #This is crucialgzf = gzip.GzipFile(fileobj=f)file_content = gzf.read()
For Python3x and boto3-
So I used BytesIO to read the compressed file into a buffer object, then I used zipfile to open the decompressed stream as uncompressed data and I was able to get the datum line by line.
import ioimport zipfileimport boto3import syss3 = boto3.resource('s3', 'us-east-1')def stream_zip_file(): count = 0 obj = s3.Object( bucket_name='MonkeyBusiness', key='/Daily/Business/Banana/{current-date}/banana.zip' ) buffer = io.BytesIO(obj.get()["Body"].read()) print (buffer) z = zipfile.ZipFile(buffer) foo2 = z.open(z.infolist()[0]) print(sys.getsizeof(foo2)) line_counter = 0 for _ in foo2: line_counter += 1 print (line_counter) z.close()if __name__ == '__main__': stream_zip_file()