How to tell if a file is gzip compressed? How to tell if a file is gzip compressed? python python

How to tell if a file is gzip compressed?


The magic number for gzip compressed files is 1f 8b. Although testing for this is not 100% reliable, it is highly unlikely that "ordinary text files" start with those two bytes—in UTF-8 it's not even legal.

Usually gzip compressed files sport the suffix .gz though. Even gzip(1) itself won't unpack files without it unless you --force it to. You could conceivably use that, but you'd still have to deal with a possible IOError (which you have to in any case).

One problem with your approach is, that gzip.GzipFile() will not throw an exception if you feed it an uncompressed file. Only a later read() will. This means, that you would probably have to implement some of your program logic twice. Ugly.


Is there a cross-platform, usable from Python way to determine if a file is gzip compressed or not?

The accepted answer explains how one can detect a gzip compressed file in general: test if the first two bytes are 1f 8b. However it does not show how to implement it in Python.

Here is one way:

def is_gz_file(filepath):    with open(filepath, 'rb') as test_f:        return test_f.read(2) == b'\x1f\x8b'


Testing the magic number of a gzip file is the only reliable way to go. However, as of python3.7 there is no need to mess with comparing bytes yourself anymore. The gzip module will compare the bytes for you and raise an exception if they do not match!

As of python3.7, this works

import gzipwith gzip.open(input_file, 'r') as fh:    try:        fh.read(1)    except OSError:        print('input_file is not a valid gzip file by OSError')

As of python3.8, this also works:

import gzipwith gzip.open(input_file, 'r') as fh:    try:        fh.read(1)    except gzip.BadGzipFile:        print('input_file is not a valid gzip file by BadGzipFile')