'/' in names in HDF5 files confusion '/' in names in HDF5 files confusion pandas pandas

'/' in names in HDF5 files confusion


I've browsed a bit through the h5check source and I can't find any place where it tests if a name contains a slash. You can examine the error messages it can produce with:

grep error_push h5checker.c -A1

The links you provided clearly state that slashes are not allowed in object names. So yes, I think you've made a file that is illegal but passes h5check. The tool seems to focus more on the binary data layout. The closest related check I can find is a guard against duplicate names.

In my opinion that's all there is to it. The fact that h5py and other libraries somehow are able to create or read this illegal file is irrelevant. The spec says "don't put slashes in object names", so you don't. End of story.

If you're not convinced, think of it like this: if you somehow managed to create a regular file with a slash in its file name, what would happen? Most programs assume that file names contains no slashes and thus that they are able to partition a directory path by splitting it at the slash characters. Your file would break this behavior and so introduce many subtle (and not so subtle) bugs. Users would complain, programmers would hate you, system administrators would curse you.

Likewise it's safe to assume that, next to PyTables, many other libraries and programs will not be able to handle slashes in variable names. The nice thing about HDF is that so many tools exist for it, and by using slashes you throw away that advantage. You may think that this this is not important, perhaps your HDF-5 files are for internal use only. However, the situation may change in 5 years, as situations tend to do.

Just bite the bullet and replace '/' with '|' before writing your variables to HDF5. Replace them back when you read them. The time you lose by implementing this, you'll win back x-fold (for x>1) by avoiding future bugs and user complaints.

Sorry about the rant but I hope to have convinced you.


Could you use h5py to read thru all your files and rewrite them without the offending characters, so that pytables can read them?

If it is outside the spec, I assume what you are experiencing is just that some implementations handle it and others do not...


Make sure you are creating groups rather than just the path name out right - this is probably where the fault creeps in. If you create the groups to your objects and then name the objects with the leaf names (extend_pressure in above) you won't have any problems anywhere.

H5py is a pretty thin wrapper around the C HDF5 library, pandas/pytables are a lot more heavy weight in approach - or at least they have alot more of their own semantics going on - and so they are checking to make sure you don't have '/' in your object names. But keep in mind everybody is using the HDF5 library at the end of the day because while HDF5 is great, it would be a huge effort to make an alternative implementation - beyond the resources of Pandas/Pytables.

Minor disclaimer: I've hacked on internals of HDF5 and H5py before.