save numpy array in append mode save numpy array in append mode python python

save numpy array in append mode


The build-in .npy file format is perfectly fine for working with small datasets, without relying on external modules other then numpy.

However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].

For instance, below is a solution to save numpy arrays in HDF5 with PyTables,

Step 1: Create an extendable EArray storage

import tablesimport numpy as npfilename = 'outarray.h5'ROW_SIZE = 100NUM_COLUMNS = 200f = tables.open_file(filename, mode='w')atom = tables.Float64Atom()array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE))for idx in range(NUM_COLUMNS):    x = np.random.rand(1, ROW_SIZE)    array_c.append(x)f.close()

Step 2: Append rows to an existing dataset (if needed)

f = tables.open_file(filename, mode='a')f.root.data.append(x)

Step 3: Read back a subset of the data

f = tables.open_file(filename, mode='r')print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset


This is an expansion on Mohit Pandey's answer showing a full save / load example. It was tested using Python 3.6 and Numpy 1.11.3.

from pathlib import Pathimport numpy as npimport osp = Path('temp.npy')with p.open('ab') as f:    np.save(f, np.zeros(2))    np.save(f, np.ones(2))with p.open('rb') as f:    fsz = os.fstat(f.fileno()).st_size    out = np.load(f)    while f.tell() < fsz:        out = np.vstack((out, np.load(f)))

out = array([[ 0., 0.], [ 1., 1.]])


I made a library to create Numpy .npy files that are larger than the main memory of the machine by appending on the zero axis. The file can then be read with mmap_mode="r".

https://pypi.org/project/npy-append-array

Installation:

pip install npy-append-array

Example:

from npy_append_array import NpyAppendArrayimport numpy as nparr1 = np.array([[1,2],[3,4]])arr2 = np.array([[1,2],[3,4],[5,6]])filename='out.npy'# optional, .append will create file automatically if not existsnp.save(filename, arr1)npaa = NpyAppendArray(filename)npaa.append(arr2)npaa.append(arr2)npaa.append(arr2)data = np.load(filename, mmap_mode="r")print(data)