Memoize function result based on selected parameters Memoize function result based on selected parameters numpy numpy

Memoize function result based on selected parameters


Here is my implementation, I had to mock up some data, so if it doesn’t do quite what you need I’m happy to tweak it a bit.

def memo(hashTable, fileName, signal: np.ndarray, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning'):      new_hash = hash(fileName + str(sampling_frequency) + str(win_len) + str(hop) + win_type)    if new_hash in hashTable.keys():        return hashTable[new_hash]    else:        answer = spectrogram(signal, sampling_frequency, win_len, hop, win_type)        hashTable[new_hash] = answer        return answerdef spectrogram(signal: np.ndarray, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning'):    makeArrayUnique = hop - 256    return np.arange(makeArrayUnique, 24 + makeArrayUnique).reshape(2,12)def testHash():    hashTable = {}    dummySignal = np.zeros(10)    print('First call', memo(hashTable, 'file1', signal=dummySignal))    print('Second Call', memo(hashTable, 'file1', signal=dummySignal, hop=260))    print('First call again', memo(hashTable, 'file1', signal=dummySignal))    print('Hash Table', hashTable)

Output showing 3 calls but only two entries in hash table:

>>> testHash()First call [[ 0  1  2  3  4  5  6  7  8  9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]Second Call [[ 4  5  6  7  8  9 10 11 12 13 14 15] [16 17 18 19 20 21 22 23 24 25 26 27]]First call again [[ 0  1  2  3  4  5  6  7  8  9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]Hash Table {-4316472197502448580: array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]]), 6772234510013844540: array([[ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],       [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])}


I decided to post my own answer that builds on @Ethan answer (+1 vote) by adding two elements:

  1. Bounding of the cache. That was one of my prerequisites, which is also why I could not accept Ethan's answer. The latter is unbounded and would quickly exhaust my memory.

  2. I feel it is more elegant; it uses a decorator and a module designed for caching (and catching some corner cases). It's more reusable and therefore more friendly for others.

from cachetools.keys import hashkeyfrom cachetools import cached, LRUCachedef mykey(signal, *args, **kwargs):    key = hashkey(*args, **kwargs)    return key@cached(LRUCache(maxsize=6), key=mykey)def spectrogram(signal: numpy.ndarray, filename, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning')

In essence, I am simply ignoring the signal and instead taking filename as an extra parameter for caching. Under certain circumstances even the filename would not be needed. If a separate process is spawned per file, there's no need for this safeguard, as the cache cannot be shared between processes anyway.

Bonus

I also decided to try Memory from joblib and it also performed well. Here's a snippet:

from joblib import Memorymemory = Memory('cachedir', verbose=0, bytes_limit=100000)@memory.cachedef spectrogram(signal: numpy.ndarray, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning')

It performed on average 25% worse than the first solution as:

  • it writes to disk
  • it computes a hash over complete numpy.ndarray

Considering the above, it's still a great score.


My Redis cache package actually has what you need: memoize based on selected fields, but you need the file name in function params. https://github.com/Yiling-J/cacheme:

@cacheme(key=lambda c: f'cache:{c.sampling_frequency}/{c.file}/{c.win_len}/{c.hop}/{c.win_type}')def spectrogram(signal, file, sampling_frequency, win_len, hop, win_type)    return something

Also this package is using Redis as backend and has lots of features, if you only need memoize based on selected fields, just take a look source code how that lambda part works.