Extract the SHA1 hash from a torrent file Extract the SHA1 hash from a torrent file python python

Extract the SHA1 hash from a torrent file


I wrote a piece of python code that verifies the hashes of downloaded files against what's in a .torrent file. Assuming you want to check a download for corruption you may find this useful.

You need the bencode package to use this. Bencode is the serialization format used in .torrent files. It can marshal lists, dictionaries, strings and numbers somewhat like JSON.

The code takes the hashes contained in the info['pieces'] string:

torrent_file = open(sys.argv[1], "rb")metainfo = bencode.bdecode(torrent_file.read())info = metainfo['info']pieces = StringIO.StringIO(info['pieces'])

That string contains a succession of 20 byte hashes (one for each piece). These hashes are then compared with the hash of the pieces of on-disk file(s).

The only complicated part of this code is handling multi-file torrents because a single torrent piece can span more than one file (internally BitTorrent treats multi-file downloads as a single contiguous file). I'm using the generator function pieces_generator() to abstract that away.

You may want to read the BitTorrent spec to understand this in more details.

Full code bellow:

import sys, os, hashlib, StringIO, bencodedef pieces_generator(info):    """Yield pieces from download file(s)."""    piece_length = info['piece length']    if 'files' in info: # yield pieces from a multi-file torrent        piece = ""        for file_info in info['files']:            path = os.sep.join([info['name']] + file_info['path'])            print path            sfile = open(path.decode('UTF-8'), "rb")            while True:                piece += sfile.read(piece_length-len(piece))                if len(piece) != piece_length:                    sfile.close()                    break                yield piece                piece = ""        if piece != "":            yield piece    else: # yield pieces from a single file torrent        path = info['name']        print path        sfile = open(path.decode('UTF-8'), "rb")        while True:            piece = sfile.read(piece_length)            if not piece:                sfile.close()                return            yield piecedef corruption_failure():    """Display error message and exit"""    print("download corrupted")    exit(1)def main():    # Open torrent file    torrent_file = open(sys.argv[1], "rb")    metainfo = bencode.bdecode(torrent_file.read())    info = metainfo['info']    pieces = StringIO.StringIO(info['pieces'])    # Iterate through pieces    for piece in pieces_generator(info):        # Compare piece hash with expected hash        piece_hash = hashlib.sha1(piece).digest()        if (piece_hash != pieces.read(20)):            corruption_failure()    # ensure we've read all pieces     if pieces.read():        corruption_failure()if __name__ == "__main__":    main()


Here how I've extracted HASH value from torrent file:

#!/usr/bin/pythonimport sys, os, hashlib, StringIOimport bencodedef main():    # Open torrent file    torrent_file = open(sys.argv[1], "rb")    metainfo = bencode.bdecode(torrent_file.read())    info = metainfo['info']    print hashlib.sha1(bencode.bencode(info)).hexdigest()    if __name__ == "__main__":    main()

It is the same as running command:

transmissioncli -i test.torrent 2>/dev/null | grep "^hash:" | awk '{print $2}'

Hope, it helps :)


According to this, you should be able to find the md5sums of files by searching for the part of the data that looks like:

d[...]6:md5sum32:[hash is here][...]e

(SHA is not part of the spec)