Using hashlib to compute md5 digest of a file in Python 3 Using hashlib to compute md5 digest of a file in Python 3 python python

Using hashlib to compute md5 digest of a file in Python 3


I think you wanted the for-loop to make successive calls to f.read(128). That can be done using iter() and functools.partial():

import hashlibfrom functools import partialdef md5sum(filename):    with open(filename, mode='rb') as f:        d = hashlib.md5()        for buf in iter(partial(f.read, 128), b''):            d.update(buf)    return d.hexdigest()print(md5sum('utils.py'))


for buf in f.read(128):  d.update(buf)

.. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes produces int objects, you get the following calls which cause the error you encountered in Python3.

d.update(97)d.update(98)d.update(99)d.update(100)

which is not what you want.

Instead, you want:

def md5sum(filename):  with open(filename, mode='rb') as f:    d = hashlib.md5()    while True:      buf = f.read(4096) # 128 is smaller than the typical filesystem block      if not buf:        break      d.update(buf)    return d.hexdigest()


I finally changed my code to the version below (that I find easy to understand) after asking the question. But I will probably change it to the version suggested by Raymond Hetting unsing functools.partial.

import hashlibdef chunks(filename, chunksize):    f = open(filename, mode='rb')    buf = "Let's go"    while len(buf):        buf = f.read(chunksize)        yield bufdef md5sum(filename):    d = hashlib.md5()    for buf in chunks(filename, 128):        d.update(buf)    return d.hexdigest()