which is faster for load: pickle or hdf5 in python which is faster for load: pickle or hdf5 in python python python

which is faster for load: pickle or hdf5 in python


I would consider only two storage formats: HDF5 (PyTables) and Feather

Here are results of my read and write comparison for the DF (shape: 4000000 x 6, size in memory 183.1 MB, size of uncompressed CSV - 492 MB).

Comparison for the following storage formats: (CSV, CSV.gzip, Pickle, HDF5 [various compression]):

                  read_s  write_s  size_ratio_to_CSVstorageCSV               17.900    69.00              1.000CSV.gzip          18.900   186.00              0.047Pickle             0.173     1.77              0.374HDF_fixed          0.196     2.03              0.435HDF_tab            0.230     2.60              0.437HDF_tab_zlib_c5    0.845     5.44              0.035HDF_tab_zlib_c9    0.860     5.95              0.035HDF_tab_bzip2_c5   2.500    36.50              0.011HDF_tab_bzip2_c9   2.500    36.50              0.011

But it might be different for you, because all my data was of the datetime dtype, so it's always better to make such a comparison with your real data or at least with the similar data...