Download and decompress gzipped file in memory?

python file gzip urllib2 stringio

You need to seek to the beginning of compressedFile after writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzip module and will appear as an empty file to it. See below:

#! /usr/bin/env pythonimport urllib2import StringIOimport gzipbaseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"filename = "man-pages-3.34.tar.gz"outFilePath = "man-pages-3.34.tar"response = urllib2.urlopen(baseURL + filename)compressedFile = StringIO.StringIO()compressedFile.write(response.read())## Set the file's current position to the beginning# of the file so that gzip.GzipFile can read# its contents from the top.#compressedFile.seek(0)decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')with open(outFilePath, 'w') as outfile:    outfile.write(decompressedFile.read())

python file gzip urllib2 stringio

For those using Python 3, the equivalent answer is:

import urllib.requestimport ioimport gzipresponse = urllib.request.urlopen(FILE_URL)compressed_file = io.BytesIO(response.read())decompressed_file = gzip.GzipFile(fileobj=compressed_file)with open(OUTFILE_PATH, 'wb') as outfile:    outfile.write(decompressed_file.read())

python file gzip urllib2 stringio

If you have Python 3.2 or above, life would be much easier:

#!/usr/bin/env python3import gzipimport urllib.requestbaseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"filename = "man-pages-4.03.tar.gz"outFilePath = filename[:-3]response = urllib.request.urlopen(baseURL + filename)with open(outFilePath, 'wb') as outfile:    outfile.write(gzip.decompress(response.read()))

For those who are interested in history, see https://bugs.python.org/issue3488 and https://hg.python.org/cpython/rev/3fa0a9553402.

CodeHunter

Download and decompress gzipped file in memory?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last