Why do the md5 hashes of two tarballs of the same file differ? Why do the md5 hashes of two tarballs of the same file differ? linux linux

Why do the md5 hashes of two tarballs of the same file differ?


tar czf outfile infiles is equivalent to

tar cf - infiles | gzip > outfile

The reason the files are different is because gzip puts its input filename and modification time into the compressed file. When the input is a pipe, it uses an empty string as the filename and the current time as the modification time.

But it also has a --no-name option, which tells it not to put the name and timestamp into the file. So if you write the expanded command explicitly, instead of using the -z option to tar, you can make use of this option.

tar cf - testfile | gzip --no-name > a.tar.gztar cf - testfile | gzip --no-name > b.tar.gz

I tested this on OS X 10.6.8 and it works.


For MacOS:

In man tar we can look at --options section and there we will find !timestamp option, which will exclude timestamp from our gzip archive. Usage:

tar --options '!timestamp' -cvzf archive.tgz filename

It will produce same md5 sum for same files with same names