Converting numpy arrays of arrays into one whole numpy array
np.concatenate
should do the trick:
Make an object array of arrays:
In [23]: arr=np.empty((4,),dtype=object)In [24]: for i in range(4):arr[i]=np.ones((2,2),int)*iIn [25]: arrOut[25]: array([array([[0, 0], [0, 0]]), array([[1, 1], [1, 1]]), array([[2, 2], [2, 2]]), array([[3, 3], [3, 3]])], dtype=object)In [28]: np.concatenate(arr)Out[28]: array([[0, 0], [0, 0], [1, 1], [1, 1], [2, 2], [2, 2], [3, 3], [3, 3]])
Or with a reshape:
In [26]: np.concatenate(arr).reshape(4,2,2)Out[26]: array([[[0, 0], [0, 0]], [[1, 1], [1, 1]], [[2, 2], [2, 2]], [[3, 3], [3, 3]]])In [27]: _.shapeOut[27]: (4, 2, 2)
concatenate
effectively treats its input as a list of arrays. So it works regardless of whether this is an object array, a list, or 3d array.
This can't be done simply with a reshape. arr
is an array of pointers - pointing to arrays located elsewhere in memory. To get a single 3d array, all of the pieces will have to be copied into one buffer. That's what concatenate does - it creates a large empty file, and copies each array, but it does it in compiled code.
np.array
does not change it:
In [37]: np.array(arr).shapeOut[37]: (4,)
but treating arr
as a list of arrays does work (but is slower than the concatenate
version - array analyses its inputs more).
In [38]: np.array([x for x in arr]).shapeOut[38]: (4, 2, 2)
Perhaps late to the party, but I believe the most efficient approach is:
np.array(arr.tolist())
To give some idea of how it would work:
import numpy as npN, M, K = 4, 3, 2arr = np.empty((N,), dtype=object)for i in range(N): arr[i] = np.full((M, K), i)print(arr)# [array([[0, 0],# [0, 0],# [0, 0]])# array([[1, 1],# [1, 1],# [1, 1]])# array([[2, 2],# [2, 2],# [2, 2]])# array([[3, 3],# [3, 3],# [3, 3]])]new_arr = np.array(arr.tolist())print(new_arr)# [[[0 0]# [0 0]# [0 0]]# [[1 1]# [1 1]# [1 1]]# [[2 2]# [2 2]# [2 2]]# [[3 3]# [3 3]# [3 3]]]
...and the timings:
%timeit np.array(arr.tolist())# 100000 loops, best of 3: 2.48 µs per loop%timeit np.concatenate(arr).reshape(N, M, K)# 100000 loops, best of 3: 3.28 µs per loop%timeit np.array([x for x in arr])# 100000 loops, best of 3: 3.32 µs per loop
I had the same issue extracting a column from a Pandas DataFrame containing an array in each row:
joined["ground truth"].values# outputsarray([array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]), ..., array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 0, 0])], dtype=object)
np.concatenate
didn't help because it merged the arrays into a flat array (same as np.hstack
). Instead, I needed to vertically stack them with np.vstack
:
array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]])