How to store wide tables in pytables / hdf5 How to store wide tables in pytables / hdf5 numpy numpy

How to store wide tables in pytables / hdf5


This might not, in fact, be possible to do in a naive way. HDF5 allocates 64 kb of space for meta-data for every data set. This meta data includes the types of the columns. So while the number of columns is a soft limit, somewhere in the 2-3 thousand range you typically run out of space to store the meta data (depending on the length of the column names, etc).

Furthermore, doesn't numpy limit the number of columns to 32? How are you representing the data with numpy now? Anything that you can get into a numpy array should correspond to a pytables Array class.


No pytables, but with h5py instead, this could work:

data = np.recfromcsv(args[0], delimiter=',',                     case_sensitive=True, deletechars='', replace_space=' ')with h5py.File(args[1], 'w') as h5file:    h5file.create_dataset('table', data=data)

I borrowed the first line from this answer; not sure if that works for you. The HDF 5 table looks fine (from a quick look with hdfview); of course, I don't know if you can use it with pytables and perhaps pandas.


Perhaps you can increase the number columns without much performance degradation. See: http://www.pytables.org/docs/manual-2.2.1/apc.html

C.1.1. Recommended maximum values

MAX_COLUMNS

Maximum number of columns in Table objects before a PerformanceWarning is issued. This limit is somewhat arbitrary and can be increased.

If you want to go this route, simply find the parameters.py file in pytables directory and change the MAX_COLUMNS value.