deque in python pandas deque in python pandas python python

deque in python pandas


As noted by dorvak, pandas is not designed for queue-like behaviour.

Below I've replicated the simple insert function from deque in pandas dataframes, numpy arrays, and also in hdf5 using the h5py module.

The timeit function reveals (unsurprisingly) that the collections module is much faster, followed by numpy and then pandas.

from collections import dequeimport pandas as pdimport numpy as npimport h5pydef insert_deque(test_sequence, buffer_deque):    for item in test_sequence:        buffer_deque.popleft()        buffer_deque.append(item)    return buffer_dequedef insert_df(test_sequence, buffer_df):    for item in test_sequence:        buffer_df.iloc[0:-1,:] = buffer_df.iloc[1:,:].values        buffer_df.iloc[-1] = item    return buffer_dfdef insert_arraylike(test_sequence, buffer_arr):    for item in test_sequence:        buffer_arr[:-1] = buffer_arr[1:]        buffer_arr[-1] = item    return buffer_arrtest_sequence = np.array(list(range(100))*2).reshape(100,2)# create buffer arraysnested_list = [[0]*2]*5buffer_deque = deque(nested_list)buffer_df = pd.DataFrame(nested_list, columns=('A','B'))buffer_arr = np.array(nested_list)# calculate speed of each process in ipythonprint("deque : ")%timeit insert_deque(test_sequence, buffer_deque)print("pandas : ")%timeit insert_df(test_sequence, buffer_df)print("numpy array : ")%timeit insert_arraylike(test_sequence, buffer_arr)print("hdf5 with h5py : ")with h5py.File("h5py_test.h5", "w") as f:    f["buffer_hdf5"] = np.array(nested_list)    %timeit insert_arraylike(test_sequence, f["buffer_hdf5"])

The %timeit results:

deque : 34.1 µs per loop

pandas : 48 ms per loop

numpy array : 187 µs per loop

hdf5 with h5py : 31.7 ms per loop

Notes:

My pandas slicing method was only slightly faster than the concat method listed in the question.

The hdf5 format (via h5py) did not show any advantages. I also don't see any advantages of HDFStore, as suggested by Andy.