Write a pandas data frame to HDF5 Write a pandas data frame to HDF5 hadoop hadoop

Write a pandas data frame to HDF5


It's difficult to give you a good answer to this rather generic question.

It's not clear how are you going to use (read) your HDF5 files - do you want to select data conditionally (using where parameter)?

fir of all you need to open a store object:

store = pd.HDFStore('/path/to/filename.h5')

now you can write (or append) to the store (i'm using here blosc compression - it's pretty fast and efficient), beside that i will use data_columns parameter in order to specify the columns that must be indexed (so you can use these columns in the where parameter later when you will read your HDF5 file):

for f in files:    #read or process each file in/into a separate `df`    store.append('df_identifier_AKA_key', df, data_columns=[list_of_indexed_cols], complevel=5, complib='blosc')store.close()