Access HDF files stored on s3 in pandas
Newer versions of python allow to read an hdf5 directly from S3 as mentioned in the read_hdf
documentation. Perhaps you should upgrade pandas if you can. This of course assumes you've set the right access rights to read those files: either with a credentials
file or with public ACLs.
Regarding your last comment, I am not sure why storing several HDF5 per df would necessarily be contra-indicated to the use of HDF5. Pickle should be much slower than HDF5 though joblib.dump
might partially improve on this.