Python: Pre-loading memory

python shared-memory

This could be an XY problem, the source of which being the assumption that you must use pickles at all; they're just awful to deal with due to how they manage dependencies and are fundamentally a poor choice for any long-term data storage because of it

The source financial data is almost-certainly in some tabular form to begin with, so it may be possible to request it in a friendlier format

A simple middleware to deserialize and reserialize the pickles in the meantime will smooth the transition

input -> load pickle -> write -> output

Converting your workflow to use Parquet or Feather which are designed to be efficient to read and write will almost-certainly make a considerable difference to your load speed

Further relevant links

You may also be able to achieve this with hickle, which will internally use a HDH5 format, ideally making it significantly faster than pickle, while still behaving like one

python shared-memory

An alternative to storing the unpickled data in memory would be to store the pickle in a ramdisk, so long as most of the time overhead comes from disk reads. Example code (to run in a terminal) is below.

sudo mkdir mnt/picklemount -o size=1536M -t tmpfs none /mnt/picklecp path/to/pickle.pkl mnt/pickle/pickle.pkl

Then you can access the pickle at mnt/pickle/pickle.pkl. Note that you can change the file names and extensions to whatever you want. If disk read is not the biggest bottleneck, you might not see a speed increase. If you run out of memory, you can try turning down the size of the ramdisk (I set it at 1536 mb, or 1.5gb)

python shared-memory

You can use shareable list:So you will have 1 python program running which will load the file and save it in memory and another python program which can take the file from memory. Your data, whatever is it you can load it in dictionary and then dump it as json and then reload json.So

Program1

import pickleimport jsonfrom multiprocessing.managers import SharedMemoryManagerYOUR_DATA=pickle.load(open(DATA_ROOT + pickle_name, 'rb'))data_dict={'DATA':YOUR_DATA}data_dict_json=json.dumps(data_dict)smm = SharedMemoryManager()smm.start() sl = smm.ShareableList(['alpha','beta',data_dict_json])print (sl)#smm.shutdown() commenting shutdown now but you will need to do it eventually

The output will look like this

#OUTPUT>>>ShareableList(['alpha', 'beta', "your data in json format"], name='psm_12abcd')

Now in Program2:

from multiprocessing import shared_memoryload_from_mem=shared_memory.ShareableList(name='psm_12abcd')load_from_mem[1]#OUTPUT'beta'load_from_mem[2]#OUTPUTyourdataindictionaryformat

You can look for more over herehttps://docs.python.org/3/library/multiprocessing.shared_memory.html

CodeHunter

Python: Pre-loading memory

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last