Pythonic way to create a numpy array from a list of numpy arrays Pythonic way to create a numpy array from a list of numpy arrays arrays arrays

Pythonic way to create a numpy array from a list of numpy arrays


Convenient way, using numpy.concatenate. I believe it's also faster, than @unutbu's answer:

In [32]: import numpy as np In [33]: list_of_arrays = list(map(lambda x: x * np.ones(2), range(5)))In [34]: list_of_arraysOut[34]: [array([ 0.,  0.]), array([ 1.,  1.]), array([ 2.,  2.]), array([ 3.,  3.]), array([ 4.,  4.])]In [37]: shape = list(list_of_arrays[0].shape)In [38]: shapeOut[38]: [2]In [39]: shape[:0] = [len(list_of_arrays)]In [40]: shapeOut[40]: [5, 2]In [41]: arr = np.concatenate(list_of_arrays).reshape(shape)In [42]: arrOut[42]: array([[ 0.,  0.],       [ 1.,  1.],       [ 2.,  2.],       [ 3.,  3.],       [ 4.,  4.]])


Suppose you know that the final array arr will never be larger than 5000x10.Then you could pre-allocate an array of maximum size, populate it with data asyou go through the loop, and then use arr.resize to cut it down to thediscovered size after exiting the loop.

The tests below suggest doing so will be slightly faster than constructing intermediatepython lists no matter what the ultimate size of the array is.

Also, arr.resize de-allocates the unused memory, so the final (though maybe not the intermediate) memory footprint is smaller than what is used by python_lists_to_array.

This shows numpy_all_the_way is faster:

% python -mtimeit -s"import test" "test.numpy_all_the_way(100)"100 loops, best of 3: 1.78 msec per loop% python -mtimeit -s"import test" "test.numpy_all_the_way(1000)"100 loops, best of 3: 18.1 msec per loop% python -mtimeit -s"import test" "test.numpy_all_the_way(5000)"10 loops, best of 3: 90.4 msec per loop% python -mtimeit -s"import test" "test.python_lists_to_array(100)"1000 loops, best of 3: 1.97 msec per loop% python -mtimeit -s"import test" "test.python_lists_to_array(1000)"10 loops, best of 3: 20.3 msec per loop% python -mtimeit -s"import test" "test.python_lists_to_array(5000)"10 loops, best of 3: 101 msec per loop

This shows numpy_all_the_way uses less memory:

% test.pyInitial memory usage: 19788After python_lists_to_array: 20976After numpy_all_the_way: 20348

test.py:

import numpy as npimport osdef memory_usage():    pid = os.getpid()    return next(line for line in open('/proc/%s/status' % pid).read().splitlines()                if line.startswith('VmSize')).split()[-2]N, M = 5000, 10def python_lists_to_array(k):    list_of_arrays = list(map(lambda x: x * np.ones(M), range(k)))    arr = np.array(list_of_arrays)    return arrdef numpy_all_the_way(k):    arr = np.empty((N, M))    for x in range(k):        arr[x] = x * np.ones(M)    arr.resize((k, M))    return arrif __name__ == '__main__':    print('Initial memory usage: %s' % memory_usage())    arr = python_lists_to_array(5000)    print('After python_lists_to_array: %s' % memory_usage())    arr = numpy_all_the_way(5000)    print('After numpy_all_the_way: %s' % memory_usage())


Even simpler than @Gill Bates' answer, here is an one line code:

np.stack(list_of_arrays, axis=0)