Python ungzipping stream of bytes? Python ungzipping stream of bytes? python python

Python ungzipping stream of bytes?


Yes, you can use the zlib module to decompress byte streams:

import zlibdef stream_gzip_decompress(stream):    dec = zlib.decompressobj(32 + zlib.MAX_WBITS)  # offset 32 to skip the header    for chunk in stream:        rv = dec.decompress(chunk)        if rv:            yield rv

The offset of 32 signals to the zlib header that the gzip header is expected but skipped.

The S3 key object is an iterator, so you can do:

for data in stream_gzip_decompress(k):    # do something with the decompressed data


I had to do the same thing and this is how I did it:

import gzipf = StringIO.StringIO()k.get_file(f)f.seek(0) #This is crucialgzf = gzip.GzipFile(fileobj=f)file_content = gzf.read()


For Python3x and boto3-

So I used BytesIO to read the compressed file into a buffer object, then I used zipfile to open the decompressed stream as uncompressed data and I was able to get the datum line by line.

import ioimport zipfileimport boto3import syss3 = boto3.resource('s3', 'us-east-1')def stream_zip_file():    count = 0    obj = s3.Object(        bucket_name='MonkeyBusiness',        key='/Daily/Business/Banana/{current-date}/banana.zip'    )    buffer = io.BytesIO(obj.get()["Body"].read())    print (buffer)    z = zipfile.ZipFile(buffer)    foo2 = z.open(z.infolist()[0])    print(sys.getsizeof(foo2))    line_counter = 0    for _ in foo2:        line_counter += 1    print (line_counter)    z.close()if __name__ == '__main__':    stream_zip_file()