How to append data to one specific dataset in a hdf5 file with h5py

I have found a solution that seems to work!

Have a look at this: incremental writes to hdf5 with h5py!

In order to append data to a specific dataset it is necessary to first resize the specific dataset in the corresponding axis and subsequently append the new data at the end of the "old" nparray.

Thus, the solution looks like this:

with h5py.File('.\PreprocessedData.h5', 'a') as hf:    hf["X_train"].resize((hf["X_train"].shape[0] + X_train_data.shape[0]), axis = 0)    hf["X_train"][-X_train_data.shape[0]:] = X_train_data    hf["X_test"].resize((hf["X_test"].shape[0] + X_test_data.shape[0]), axis = 0)    hf["X_test"][-X_test_data.shape[0]:] = X_test_data    hf["Y_train"].resize((hf["Y_train"].shape[0] + Y_train_data.shape[0]), axis = 0)    hf["Y_train"][-Y_train_data.shape[0]:] = Y_train_data    hf["Y_test"].resize((hf["Y_test"].shape[0] + Y_test_data.shape[0]), axis = 0)    hf["Y_test"][-Y_test_data.shape[0]:] = Y_test_data

However, note that you should create the dataset with maxshape=(None,), for example

h5f.create_dataset('X_train', data=orig_data, compression="gzip", chunks=True, maxshape=(None,))

otherwise the dataset cannot be extended.

python numpy deep-learning hdf5 h5py

@Midas.Inc answer works great. Just to provide a minimal working example for those who are interested:

import numpy as npimport h5pyf = h5py.File('MyDataset.h5', 'a')for i in range(10):  # Data to be appended  new_data = np.ones(shape=(100,64,64)) * i  new_label = np.ones(shape=(100,1)) * (i+1)  if i == 0:    # Create the dataset at first    f.create_dataset('data', data=new_data, compression="gzip", chunks=True, maxshape=(None,64,64))    f.create_dataset('label', data=new_label, compression="gzip", chunks=True, maxshape=(None,1))   else:    # Append new data to it    f['data'].resize((f['data'].shape[0] + new_data.shape[0]), axis=0)    f['data'][-new_data.shape[0]:] = new_data    f['label'].resize((f['label'].shape[0] + new_label.shape[0]), axis=0)    f['label'][-new_label.shape[0]:] = new_label  print("I am on iteration {} and 'data' chunk has shape:{}".format(i,f['data'].shape))f.close()

The code outputs:

#I am on iteration 0 and 'data' chunk has shape:(100, 64, 64)#I am on iteration 1 and 'data' chunk has shape:(200, 64, 64)#I am on iteration 2 and 'data' chunk has shape:(300, 64, 64)#I am on iteration 3 and 'data' chunk has shape:(400, 64, 64)#I am on iteration 4 and 'data' chunk has shape:(500, 64, 64)#I am on iteration 5 and 'data' chunk has shape:(600, 64, 64)#I am on iteration 6 and 'data' chunk has shape:(700, 64, 64)#I am on iteration 7 and 'data' chunk has shape:(800, 64, 64)#I am on iteration 8 and 'data' chunk has shape:(900, 64, 64)#I am on iteration 9 and 'data' chunk has shape:(1000, 64, 64)

CodeHunter

How to append data to one specific dataset in a hdf5 file with h5py

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last