Efficiently Creating A Pandas DataFrame From A Numpy 3d array
Here's one approach that does most of the processing on NumPy before finally putting it out as a DataFrame, like so -
m,n,r = a.shapeout_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1)))out_df = pd.DataFrame(out_arr)
If you precisely know that the number of columns would be 2
, such that we would have b
and c
as the last two columns and a
as the first one, you can add column names like so -
out_df = pd.DataFrame(out_arr,columns=['a', 'b', 'c'])
Sample run -
>>> aarray([[[2, 0], [1, 7], [3, 8]], [[5, 0], [0, 7], [8, 0]], [[2, 5], [8, 2], [1, 2]], [[5, 3], [1, 6], [3, 2]]])>>> out_df a b c0 0 2 01 0 1 72 0 3 83 1 5 04 1 0 75 1 8 06 2 2 57 2 8 28 2 1 29 3 5 310 3 1 611 3 3 2
Using Panel
:
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])b=pd.Panel(rollaxis(a,2)).to_frame()c=b.set_index(b.index.labels[0]).reset_index()c.columns=list('abc')
then a
is :
[[[1 2] [3 4]] [[5 6] [7 8]]]
b
is :
0 1major minor 0 0 1 2 1 3 41 0 5 6 1 7 8
and c
is :
a b c0 0 1 21 0 3 42 1 5 63 1 7 8