How to store wide tables in pytables / hdf5
This might not, in fact, be possible to do in a naive way. HDF5 allocates 64 kb of space for meta-data for every data set. This meta data includes the types of the columns. So while the number of columns is a soft limit, somewhere in the 2-3 thousand range you typically run out of space to store the meta data (depending on the length of the column names, etc).
Furthermore, doesn't numpy limit the number of columns to 32? How are you representing the data with numpy now? Anything that you can get into a numpy array should correspond to a pytables Array class.
No pytables, but with h5py instead, this could work:
data = np.recfromcsv(args[0], delimiter=',', case_sensitive=True, deletechars='', replace_space=' ')with h5py.File(args[1], 'w') as h5file: h5file.create_dataset('table', data=data)
I borrowed the first line from this answer; not sure if that works for you. The HDF 5 table looks fine (from a quick look with hdfview); of course, I don't know if you can use it with pytables and perhaps pandas.
Perhaps you can increase the number columns without much performance degradation. See: http://www.pytables.org/docs/manual-2.2.1/apc.html
C.1.1. Recommended maximum values
MAX_COLUMNS
Maximum number of columns in Table objects before a PerformanceWarning is issued. This limit is somewhat arbitrary and can be increased.
If you want to go this route, simply find the parameters.py file in pytables directory and change the MAX_COLUMNS value.