HDF5 min_itemsize error: ValueError: Trying to store a string with len [##] in [y] column but this column has a limit of [##]!

UPDATE:

you have misspelled data_columns parameter: data_column - it should be data_columns. As a result you didn't have any indexed columns in your HDF Store and HDF store added values_block_X:

In [70]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')

misspelled parameters will be ignored:

In [71]: store.append('no_idx_wrong_dc', df, data_column=df.columns, index=False)In [72]: store.get_storer('no_idx_wrong_dc').tableOut[72]:/no_idx_wrong_dc/table (Table(10,)) ''  description := {  "index": Int64Col(shape=(), dflt=0, pos=0),  "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),  "values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),  "values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}  byteorder := 'little'  chunkshape := (1213,)

is the same as the following:

In [73]: store.append('no_idx_no_dc', df, index=False)In [74]: store.get_storer('no_idx_no_dc').tableOut[74]:/no_idx_no_dc/table (Table(10,)) ''  description := {  "index": Int64Col(shape=(), dflt=0, pos=0),  "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),  "values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),  "values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}  byteorder := 'little'  chunkshape := (1213,)

let's spell it correctly:

In [75]: store.append('no_idx_dc', df, data_columns=df.columns, index=False)In [76]: store.get_storer('no_idx_dc').tableOut[76]:/no_idx_dc/table (Table(10,)) ''  description := {  "index": Int64Col(shape=(), dflt=0, pos=0),  "value": Float64Col(shape=(), dflt=0.0, pos=1),  "count": Int64Col(shape=(), dflt=0, pos=2),  "s": StringCol(itemsize=30, shape=(), dflt=b'', pos=3)}  byteorder := 'little'  chunkshape := (1213,)

OLD Answer:

AFAIK you can effectively set the min_itemsize parameter on the first append only.

Demo:

In [33]: dfOut[33]:   num                 s0   11  aaaaaaaaaaaaaaaa1   12    bbbbbbbbbbbbbb2   13     ccccccccccccc3   14       dddddddddddIn [34]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')In [35]: store.append('test_1', df, data_columns=True)In [36]: store.get_storer('test_1').table.descriptionOut[36]:{  "index": Int64Col(shape=(), dflt=0, pos=0),  "num": Int64Col(shape=(), dflt=0, pos=1),  "s": StringCol(itemsize=16, shape=(), dflt=b'', pos=2)}In [37]: df.loc[4] = [15, 'X'*200]In [38]: dfOut[38]:   num                                                  s0   11                                   aaaaaaaaaaaaaaaa1   12                                     bbbbbbbbbbbbbb2   13                                      ccccccccccccc3   14                                        ddddddddddd4   15  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...In [39]: store.append('test_1', df, data_columns=True)...skipped...ValueError: Trying to store a string with len [200] in [s] column butthis column has a limit of [16]!Consider using min_itemsize to preset the sizes on these columns

now using min_itemsize, but still appending to the existing store object:

In [40]: store.append('test_1', df, data_columns=True, min_itemsize={'s':250})...skipped...ValueError: Trying to store a string with len [250] in [s] column butthis column has a limit of [16]!Consider using min_itemsize to preset the sizes on these columns

The following works if we will create a new object in our store:

In [41]: store.append('test_2', df, data_columns=True, min_itemsize={'s':250})

Check column sizes:

In [42]: store.get_storer('test_2').table.descriptionOut[42]:{  "index": Int64Col(shape=(), dflt=0, pos=0),  "num": Int64Col(shape=(), dflt=0, pos=1),  "s": StringCol(itemsize=250, shape=(), dflt=b'', pos=2)}

python pandas hdf5 pytables hdfstore

I started to get this error around about the same time as updating Pandas from 18.1 to 22.0 (although this could be unrelated).

I fixed the error in the existing HDF5 file by manually reading the dataframe in, then writing a new HDF5 file with a larger min_itemsize for the column mentioned in the error:

filename_hdf5 = "C:\test.h5"df = pd.read_hdf(filename_hdf5, 'table_name')hdf = HDFStore(filename_hdf5)hdf.put('table_name', df, format='table', data_columns=True, min_itemsize={'ColumnNameMentionedInError': 10})hdf.close()

I then updated the existing code to set min_itemsize on key creation.

Extra for Experts

The error occurs because one is trying to append more rows to an existing dataframe with a fixed column width too narrow for the new data. The fixed column width was originally set based on the longest string in the column when the dataframe was first written.

Methinks that pandas should handle this error transparently, rather than leaving what is effectively a timebomb for all future appends. This issue could take weeks or even years to surface.

CodeHunter

HDF5 min_itemsize error: ValueError: Trying to store a string with len [##] in [y] column but this column has a limit of [##]!

Extra for Experts

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last