Writing & Appending arrays of float to the only dataset in hdf5 file in C++ Writing & Appending arrays of float to the only dataset in hdf5 file in C++ arrays arrays

Writing & Appending arrays of float to the only dataset in hdf5 file in C++


How to append the data in this case? What kind of function must I use?

You must use hyperslabs. That's what you need to write only part of a dataset.The function to do that is H5Sselect_hyperslab. Use it on fd1 and use fd1 as your file dataspace in your H5Dwrite call.

I have tried put infinity flag of HDF5 in, but the runtime execution complains.

You need to create a chunked dataset in order to be able to set its maximum size to infinity. Create a dataset creation property list and use H5Pset_layout to make it chunked. Use H5Pset_chunk to set the chunk size. Then create your dataset using this property list.

I don't want to calculate the data that I have each time; is there a way to just simply keep on adding data in, without caring the value of fdim?

You can do two things:

  1. Precompute the final size so you can create a dataset big enough. It looks like that's what you are doing.

  2. Extend your dataset as you go using H5Dset_extent. For this you need to set the maximum dimensions to infinity so you need a chunked dataset (see above).

In both case, you need to select an hyperslab on the file dataspace in your H5Dwrite call (see above).

Walkthrough working code

#include <iostream>#include <hdf5.h>// Constantsconst char saveFilePath[] = "test.h5";const hsize_t ndims = 2;const hsize_t ncols = 3;int main(){

First, create a hdf5 file.

    hid_t file = H5Fcreate(saveFilePath, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);    std::cout << "- File created" << std::endl;

Then create a 2D dataspace.The size of the first dimension is unlimited. We set it initially to 0 to show how you can extend the dataset at each step. You could also set it to the size of the first buffer you are going to write for instance.The size of the second dimension is fixed.

    hsize_t dims[ndims] = {0, ncols};    hsize_t max_dims[ndims] = {H5S_UNLIMITED, ncols};    hid_t file_space = H5Screate_simple(ndims, dims, max_dims);    std::cout << "- Dataspace created" << std::endl;

Then create a dataset creation property list.The layout of the dataset have to be chunked when using unlimited dimensions.The choice of the chunk size affects performances, both in time and disk space. If the chunks are very small, you will have a lot of overhead. If they are too large, you might allocate space that you don't need and your files might end up being too large.This is a toy example so we will choose chunks of one line.

    hid_t plist = H5Pcreate(H5P_DATASET_CREATE);    H5Pset_layout(plist, H5D_CHUNKED);    hsize_t chunk_dims[ndims] = {2, ncols};    H5Pset_chunk(plist, ndims, chunk_dims);    std::cout << "- Property list created" << std::endl;

Create the dataset.

    hid_t dset = H5Dcreate(file, "dset1", H5T_NATIVE_FLOAT, file_space, H5P_DEFAULT, plist, H5P_DEFAULT);    std::cout << "- Dataset 'dset1' created" << std::endl;

Close resources. The dataset is now created so we don't need the property list anymore.We don't need the file dataspace anymore because when the dataset will be extended, it will become invalid as it will still hold the previous extent.So we will have to grab the updated file dataspace anyway.

    H5Pclose(plist);    H5Sclose(file_space);

We will now append two buffers to the end of the dataset.The first one will be two lines long.The second one will be three lines long.

First buffer

We create a 2D buffer (contigous in memory, row major order).We will allocate enough memory to store 3 lines, so we can reuse the buffer.Let us create an array of pointers so we can use the b[i][j] notationinstead of buffer[i * ncols + j]. This is purely esthetic.

    hsize_t nlines = 3;    float *buffer = new float[nlines * ncols];    float **b = new float*[nlines];    for (hsize_t i = 0; i < nlines; ++i){        b[i] = &buffer[i * ncols];    }

Initial values in buffer to be written in the dataset:

    b[0][0] = 0.1;    b[0][1] = 0.2;    b[0][2] = 0.3;    b[1][0] = 0.4;    b[1][1] = 0.5;    b[1][2] = 0.6;

We create a memory dataspace to indicate the size of our buffer in memory.Remember the first buffer is only two lines long.

    dims[0] = 2;    dims[1] = ncols;    hid_t mem_space = H5Screate_simple(ndims, dims, NULL);    std::cout << "- Memory dataspace created" << std::endl;

We now need to extend the dataset.We set the initial size of the dataset to 0x3, we thus need to extend it first.Note that we extend the dataset itself, not its dataspace.Remember the first buffer is only two lines long.

    dims[0] = 2;    dims[1] = ncols;    H5Dset_extent(dset, dims);    std::cout << "- Dataset extended" << std::endl;

Select hyperslab on file dataset.

    file_space = H5Dget_space(dset);    hsize_t start[2] = {0, 0};    hsize_t count[2] = {2, ncols};    H5Sselect_hyperslab(file_space, H5S_SELECT_SET, start, NULL, count, NULL);    std::cout << "- First hyperslab selected" << std::endl;

Write buffer to dataset.mem_space and file_space should now have the same number of elements selected.Note that buffer and &b[0][0] are equivalent.

    H5Dwrite(dset, H5T_NATIVE_FLOAT, mem_space, file_space, H5P_DEFAULT, buffer);    std::cout << "- First buffer written" << std::endl;

We can now close the file dataspace.We could close the memory dataspace now and create a new one for the second buffer,but we will simply update its size.

    H5Sclose(file_space);

Second buffer

New values in buffer to be appended to the dataset:

    b[0][0] = 1.1;    b[0][1] = 1.2;    b[0][2] = 1.3;    b[1][0] = 1.4;    b[1][1] = 1.5;    b[1][2] = 1.6;    b[2][0] = 1.7;    b[2][1] = 1.8;    b[2][2] = 1.9;

Resize the memory dataspace to indicate the new size of our buffer.The second buffer is three lines long.

    dims[0] = 3;    dims[1] = ncols;    H5Sset_extent_simple(mem_space, ndims, dims, NULL);    std::cout << "- Memory dataspace resized" << std::endl;

Extend dataset.Note that in this simple example, we know that 2 + 3 = 5.In general, you could read the current extent from the file dataspaceand add the desired number of lines to it.

    dims[0] = 5;    dims[1] = ncols;    H5Dset_extent(dset, dims);    std::cout << "- Dataset extended" << std::endl;

Select hyperslab on file dataset.Again in this simple example, we know that 0 + 2 = 2.In general, you could read the current extent from the file dataspace.The second buffer is three lines long.

    file_space = H5Dget_space(dset);    start[0] = 2;    start[1] = 0;    count[0] = 3;    count[1] = ncols;    H5Sselect_hyperslab(file_space, H5S_SELECT_SET, start, NULL, count, NULL);    std::cout << "- Second hyperslab selected" << std::endl;

Append buffer to dataset

    H5Dwrite(dset, H5T_NATIVE_FLOAT, mem_space, file_space, H5P_DEFAULT, buffer);    std::cout << "- Second buffer written" << std::endl;

The end: let's close all the resources:

    delete[] b;    delete[] buffer;    H5Sclose(file_space);    H5Sclose(mem_space);    H5Dclose(dset);    H5Fclose(file);    std::cout << "- Resources released" << std::endl;}

NB: I removed the previous updates because the answer was too long. If you are interested, browse the history.