Memoize function result based on selected parameters
Here is my implementation, I had to mock up some data, so if it doesn’t do quite what you need I’m happy to tweak it a bit.
def memo(hashTable, fileName, signal: np.ndarray, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning'): new_hash = hash(fileName + str(sampling_frequency) + str(win_len) + str(hop) + win_type) if new_hash in hashTable.keys(): return hashTable[new_hash] else: answer = spectrogram(signal, sampling_frequency, win_len, hop, win_type) hashTable[new_hash] = answer return answerdef spectrogram(signal: np.ndarray, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning'): makeArrayUnique = hop - 256 return np.arange(makeArrayUnique, 24 + makeArrayUnique).reshape(2,12)def testHash(): hashTable = {} dummySignal = np.zeros(10) print('First call', memo(hashTable, 'file1', signal=dummySignal)) print('Second Call', memo(hashTable, 'file1', signal=dummySignal, hop=260)) print('First call again', memo(hashTable, 'file1', signal=dummySignal)) print('Hash Table', hashTable)
Output showing 3 calls but only two entries in hash table:
>>> testHash()First call [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]Second Call [[ 4 5 6 7 8 9 10 11 12 13 14 15] [16 17 18 19 20 21 22 23 24 25 26 27]]First call again [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]Hash Table {-4316472197502448580: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]]), 6772234510013844540: array([[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])}
I decided to post my own answer that builds on @Ethan answer (+1 vote) by adding two elements:
Bounding of the cache. That was one of my prerequisites, which is also why I could not accept Ethan's answer. The latter is unbounded and would quickly exhaust my memory.
I feel it is more elegant; it uses a decorator and a module designed for caching (and catching some corner cases). It's more reusable and therefore more friendly for others.
from cachetools.keys import hashkeyfrom cachetools import cached, LRUCachedef mykey(signal, *args, **kwargs): key = hashkey(*args, **kwargs) return key@cached(LRUCache(maxsize=6), key=mykey)def spectrogram(signal: numpy.ndarray, filename, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning')
In essence, I am simply ignoring the signal
and instead taking filename
as an extra parameter for caching. Under certain circumstances even the filename
would not be needed. If a separate process is spawned per file, there's no need for this safeguard, as the cache cannot be shared between processes anyway.
Bonus
I also decided to try Memory
from joblib and it also performed well. Here's a snippet:
from joblib import Memorymemory = Memory('cachedir', verbose=0, bytes_limit=100000)@memory.cachedef spectrogram(signal: numpy.ndarray, sampling_frequency=16000, win_len=512, hop=256, win_type='hanning')
It performed on average 25% worse than the first solution as:
- it writes to disk
- it computes a hash over complete
numpy.ndarray
Considering the above, it's still a great score.
My Redis cache package actually has what you need: memoize based on selected fields, but you need the file name in function params. https://github.com/Yiling-J/cacheme:
@cacheme(key=lambda c: f'cache:{c.sampling_frequency}/{c.file}/{c.win_len}/{c.hop}/{c.win_type}')def spectrogram(signal, file, sampling_frequency, win_len, hop, win_type) return something
Also this package is using Redis as backend and has lots of features, if you only need memoize based on selected fields, just take a look source code how that lambda part works.