In Python, is there a concise way of comparing whether the contents of two text files are the same? In Python, is there a concise way of comparing whether the contents of two text files are the same? python python

In Python, is there a concise way of comparing whether the contents of two text files are the same?


The low level way:

from __future__ import with_statementwith open(filename1) as f1:   with open(filename2) as f2:      if f1.read() == f2.read():         ...

The high level way:

import filecmpif filecmp.cmp(filename1, filename2, shallow=False):   ...


If you're going for even basic efficiency, you probably want to check the file size first:

if os.path.getsize(filename1) == os.path.getsize(filename2):  if open('filename1','r').read() == open('filename2','r').read():    # Files are the same.

This saves you reading every line of two files that aren't even the same size, and thus can't be the same.

(Even further than that, you could call out to a fast MD5sum of each file and compare those, but that's not "in Python", so I'll stop here.)


This is a functional-style file comparison function. It returns instantly False if the files have different sizes; otherwise, it reads in 4KiB block sizes and returns False instantly upon the first difference:

from __future__ import with_statementimport osimport itertools, functools, operatortry:    izip= itertools.izip  # Python 2except AttributeError:    izip= zip  # Python 3def filecmp(filename1, filename2):    "Do the two files have exactly the same contents?"    with open(filename1, "rb") as fp1, open(filename2, "rb") as fp2:        if os.fstat(fp1.fileno()).st_size != os.fstat(fp2.fileno()).st_size:            return False # different sizes ∴ not equal        # set up one 4k-reader for each file        fp1_reader= functools.partial(fp1.read, 4096)        fp2_reader= functools.partial(fp2.read, 4096)        # pair each 4k-chunk from the two readers while they do not return '' (EOF)        cmp_pairs= izip(iter(fp1_reader, b''), iter(fp2_reader, b''))        # return True for all pairs that are not equal        inequalities= itertools.starmap(operator.ne, cmp_pairs)        # voilà; any() stops at first True value        return not any(inequalities)if __name__ == "__main__":    import sys    print filecmp(sys.argv[1], sys.argv[2])

Just a different take :)