Using hashlib to compute md5 digest of a file in Python 3
I think you wanted the for-loop to make successive calls to f.read(128)
. That can be done using iter() and functools.partial():
import hashlibfrom functools import partialdef md5sum(filename): with open(filename, mode='rb') as f: d = hashlib.md5() for buf in iter(partial(f.read, 128), b''): d.update(buf) return d.hexdigest()print(md5sum('utils.py'))
for buf in f.read(128): d.update(buf)
.. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes
produces int
objects, you get the following calls which cause the error you encountered in Python3.
d.update(97)d.update(98)d.update(99)d.update(100)
which is not what you want.
Instead, you want:
def md5sum(filename): with open(filename, mode='rb') as f: d = hashlib.md5() while True: buf = f.read(4096) # 128 is smaller than the typical filesystem block if not buf: break d.update(buf) return d.hexdigest()
I finally changed my code to the version below (that I find easy to understand) after asking the question. But I will probably change it to the version suggested by Raymond Hetting unsing functools.partial.
import hashlibdef chunks(filename, chunksize): f = open(filename, mode='rb') buf = "Let's go" while len(buf): buf = f.read(chunksize) yield bufdef md5sum(filename): d = hashlib.md5() for buf in chunks(filename, 128): d.update(buf) return d.hexdigest()