How to hash a large object (dataset) in Python?

python hash numpy sha1 pickle

Thanks to John Montgomery I think I have found a solution, and I think it has less overhead than converting every number in possibly huge arrays to strings:

I can create a byte-view of the arrays and use these to update the hash. And somehow this seems to give the same digest as directly updating using the array:

>>> import hashlib>>> import numpy>>> a = numpy.random.rand(10, 100)>>> b = a.view(numpy.uint8)>>> print a.dtype, b.dtype # a and b have a different data typefloat64 uint8>>> hashlib.sha1(a).hexdigest() # byte view sha1'794de7b1316b38d989a9040e6e26b9256ca3b5eb'>>> hashlib.sha1(b).hexdigest() # array sha1'794de7b1316b38d989a9040e6e26b9256ca3b5eb'

python hash numpy sha1 pickle

What's the format of the data in the arrays? Couldn't you just iterate through the arrays, convert them into a string (via some reproducible means) and then feed that into your hash via update?

e.g.

import hashlibm = hashlib.md5() # or sha1 etcfor value in array: # array contains the data    m.update(str(value))

Don't forget though that numpy arrays won't provide __hash__() because they are mutable. So be careful not to modify the arrays after your calculated your hash (as it will no longer be the same).

python hash numpy sha1 pickle

There is a package for memoizing functions that use numpy arrays as inputs joblib. Found from this question.

CodeHunter

How to hash a large object (dataset) in Python?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last