Download and decompress gzipped file in memory? Download and decompress gzipped file in memory? python python

Download and decompress gzipped file in memory?


You need to seek to the beginning of compressedFile after writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzip module and will appear as an empty file to it. See below:

#! /usr/bin/env pythonimport urllib2import StringIOimport gzipbaseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"filename = "man-pages-3.34.tar.gz"outFilePath = "man-pages-3.34.tar"response = urllib2.urlopen(baseURL + filename)compressedFile = StringIO.StringIO()compressedFile.write(response.read())## Set the file's current position to the beginning# of the file so that gzip.GzipFile can read# its contents from the top.#compressedFile.seek(0)decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')with open(outFilePath, 'w') as outfile:    outfile.write(decompressedFile.read())


For those using Python 3, the equivalent answer is:

import urllib.requestimport ioimport gzipresponse = urllib.request.urlopen(FILE_URL)compressed_file = io.BytesIO(response.read())decompressed_file = gzip.GzipFile(fileobj=compressed_file)with open(OUTFILE_PATH, 'wb') as outfile:    outfile.write(decompressed_file.read())


If you have Python 3.2 or above, life would be much easier:

#!/usr/bin/env python3import gzipimport urllib.requestbaseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"filename = "man-pages-4.03.tar.gz"outFilePath = filename[:-3]response = urllib.request.urlopen(baseURL + filename)with open(outFilePath, 'wb') as outfile:    outfile.write(gzip.decompress(response.read()))

For those who are interested in history, see https://bugs.python.org/issue3488 and https://hg.python.org/cpython/rev/3fa0a9553402.