Converting numpy arrays of arrays into one whole numpy array Converting numpy arrays of arrays into one whole numpy array numpy numpy

Converting numpy arrays of arrays into one whole numpy array


np.concatenate should do the trick:

Make an object array of arrays:

In [23]: arr=np.empty((4,),dtype=object)In [24]: for i in range(4):arr[i]=np.ones((2,2),int)*iIn [25]: arrOut[25]: array([array([[0, 0],       [0, 0]]), array([[1, 1],       [1, 1]]),       array([[2, 2],       [2, 2]]), array([[3, 3],       [3, 3]])], dtype=object)In [28]: np.concatenate(arr)Out[28]: array([[0, 0],       [0, 0],       [1, 1],       [1, 1],       [2, 2],       [2, 2],       [3, 3],       [3, 3]])

Or with a reshape:

In [26]: np.concatenate(arr).reshape(4,2,2)Out[26]: array([[[0, 0],        [0, 0]],       [[1, 1],        [1, 1]],       [[2, 2],        [2, 2]],       [[3, 3],        [3, 3]]])In [27]: _.shapeOut[27]: (4, 2, 2)

concatenate effectively treats its input as a list of arrays. So it works regardless of whether this is an object array, a list, or 3d array.

This can't be done simply with a reshape. arr is an array of pointers - pointing to arrays located elsewhere in memory. To get a single 3d array, all of the pieces will have to be copied into one buffer. That's what concatenate does - it creates a large empty file, and copies each array, but it does it in compiled code.


np.array does not change it:

In [37]: np.array(arr).shapeOut[37]: (4,)

but treating arr as a list of arrays does work (but is slower than the concatenate version - array analyses its inputs more).

In [38]: np.array([x for x in arr]).shapeOut[38]: (4, 2, 2)


Perhaps late to the party, but I believe the most efficient approach is:

np.array(arr.tolist())

To give some idea of how it would work:

import numpy as npN, M, K = 4, 3, 2arr = np.empty((N,), dtype=object)for i in range(N):    arr[i] = np.full((M, K), i)print(arr)# [array([[0, 0],#        [0, 0],#        [0, 0]])#  array([[1, 1],#        [1, 1],#        [1, 1]])#  array([[2, 2],#        [2, 2],#        [2, 2]])#  array([[3, 3],#        [3, 3],#        [3, 3]])]new_arr = np.array(arr.tolist())print(new_arr)# [[[0 0]#   [0 0]#   [0 0]]#  [[1 1]#   [1 1]#   [1 1]]#  [[2 2]#   [2 2]#   [2 2]]#  [[3 3]#   [3 3]#   [3 3]]]

...and the timings:

%timeit np.array(arr.tolist())# 100000 loops, best of 3: 2.48 µs per loop%timeit np.concatenate(arr).reshape(N, M, K)# 100000 loops, best of 3: 3.28 µs per loop%timeit np.array([x for x in arr])# 100000 loops, best of 3: 3.32 µs per loop


I had the same issue extracting a column from a Pandas DataFrame containing an array in each row:

joined["ground truth"].values# outputsarray([array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]),       array([0, 0, 0, 0, 0, 0, 0, 0]), ...,       array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]),       array([0, 0, 0, 0, 0, 0, 0, 0])], dtype=object)

np.concatenate didn't help because it merged the arrays into a flat array (same as np.hstack). Instead, I needed to vertically stack them with np.vstack:

array([[0, 0, 0, ..., 0, 0, 0],       [0, 0, 0, ..., 0, 0, 0],       [0, 0, 0, ..., 0, 0, 0],       ...,       [0, 0, 0, ..., 0, 0, 0],       [0, 0, 0, ..., 0, 0, 0],       [0, 0, 0, ..., 0, 0, 0]])