Creating very large NUMPY arrays in small chunks (PyTables vs. numpy.memmap)

That's weird. The np.memmap should work. I've been using it with 250Gb data on a 12Gb RAM machine without problems.

Does the system really runs out of memory at the very moment of the creation of the memmap file? Or it happens along the code? If it happens at the file creation I really don't know what the problem would be.

When I started using memmap I've made some mistakes that led me to memory run out. For me, something like the below code should work:

mmapData = np.memmap(mmapFile, mode='w+', shape = (smallarray_size,number_of_arrays), dtype ='float64')for k in range(number_of_arrays):  smallarray = np.fromfile(list_of_files[k]) # list_of_file is the list with the files name  smallarray = do_something_with_array(smallarray)  mmapData[:,k] = smallarray

It may not be the most efficient way, but it seems to me that it would have the lowest memory usage.

Ps: Be aware that the default dtype value for memmap(int) and fromfile(float) are different!

python numpy large-files mmap pytables

HDF5 is a C library that can efficiently store large on-disk arrays. Both PyTables and h5py are Python libraries on top of HDF5. If you're using tabular data then PyTables might be preferred; if you have just plain arrays then h5py is probably more stable/simpler.

There are out-of-core numpy array solutions that handle the chunking for you. Dask.array would give you plain numpy semantics on top of your collection of chunked files (see docs on stacking.)

CodeHunter

Creating very large NUMPY arrays in small chunks (PyTables vs. numpy.memmap)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last