Python: Pre-loading memory Python: Pre-loading memory python python

Python: Pre-loading memory


This could be an XY problem, the source of which being the assumption that you must use pickles at all; they're just awful to deal with due to how they manage dependencies and are fundamentally a poor choice for any long-term data storage because of it

The source financial data is almost-certainly in some tabular form to begin with, so it may be possible to request it in a friendlier format

A simple middleware to deserialize and reserialize the pickles in the meantime will smooth the transition

input -> load pickle -> write -> output

Converting your workflow to use Parquet or Feather which are designed to be efficient to read and write will almost-certainly make a considerable difference to your load speed

Further relevant links


You may also be able to achieve this with hickle, which will internally use a HDH5 format, ideally making it significantly faster than pickle, while still behaving like one


An alternative to storing the unpickled data in memory would be to store the pickle in a ramdisk, so long as most of the time overhead comes from disk reads. Example code (to run in a terminal) is below.

sudo mkdir mnt/picklemount -o size=1536M -t tmpfs none /mnt/picklecp path/to/pickle.pkl mnt/pickle/pickle.pkl 

Then you can access the pickle at mnt/pickle/pickle.pkl. Note that you can change the file names and extensions to whatever you want. If disk read is not the biggest bottleneck, you might not see a speed increase. If you run out of memory, you can try turning down the size of the ramdisk (I set it at 1536 mb, or 1.5gb)


You can use shareable list:So you will have 1 python program running which will load the file and save it in memory and another python program which can take the file from memory. Your data, whatever is it you can load it in dictionary and then dump it as json and then reload json.So

Program1

import pickleimport jsonfrom multiprocessing.managers import SharedMemoryManagerYOUR_DATA=pickle.load(open(DATA_ROOT + pickle_name, 'rb'))data_dict={'DATA':YOUR_DATA}data_dict_json=json.dumps(data_dict)smm = SharedMemoryManager()smm.start() sl = smm.ShareableList(['alpha','beta',data_dict_json])print (sl)#smm.shutdown() commenting shutdown now but you will need to do it eventually

The output will look like this

#OUTPUT>>>ShareableList(['alpha', 'beta', "your data in json format"], name='psm_12abcd')

Now in Program2:

from multiprocessing import shared_memoryload_from_mem=shared_memory.ShareableList(name='psm_12abcd')load_from_mem[1]#OUTPUT'beta'load_from_mem[2]#OUTPUTyourdataindictionaryformat

You can look for more over herehttps://docs.python.org/3/library/multiprocessing.shared_memory.html