Using numpy to build an array of all combinations of two arrays Using numpy to build an array of all combinations of two arrays arrays arrays

Using numpy to build an array of all combinations of two arrays


Here's a pure-numpy implementation. It's about 5× faster than using itertools.

Python 3:

import numpy as npdef cartesian(arrays, out=None):    """    Generate a cartesian product of input arrays.    Parameters    ----------    arrays : list of array-like        1-D arrays to form the cartesian product of.    out : ndarray        Array to place the cartesian product in.    Returns    -------    out : ndarray        2-D array of shape (M, len(arrays)) containing cartesian products        formed of input arrays.    Examples    --------    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))    array([[1, 4, 6],           [1, 4, 7],           [1, 5, 6],           [1, 5, 7],           [2, 4, 6],           [2, 4, 7],           [2, 5, 6],           [2, 5, 7],           [3, 4, 6],           [3, 4, 7],           [3, 5, 6],           [3, 5, 7]])    """    arrays = [np.asarray(x) for x in arrays]    dtype = arrays[0].dtype    n = np.prod([x.size for x in arrays])    if out is None:        out = np.zeros([n, len(arrays)], dtype=dtype)    #m = n / arrays[0].size    m = int(n / arrays[0].size)     out[:,0] = np.repeat(arrays[0], m)    if arrays[1:]:        cartesian(arrays[1:], out=out[0:m, 1:])        for j in range(1, arrays[0].size):        #for j in xrange(1, arrays[0].size):            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]    return out

Python 2:

import numpy as npdef cartesian(arrays, out=None):    arrays = [np.asarray(x) for x in arrays]    dtype = arrays[0].dtype    n = np.prod([x.size for x in arrays])    if out is None:        out = np.zeros([n, len(arrays)], dtype=dtype)    m = n / arrays[0].size    out[:,0] = np.repeat(arrays[0], m)    if arrays[1:]:        cartesian(arrays[1:], out=out[0:m, 1:])        for j in xrange(1, arrays[0].size):            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]    return out


In newer version of numpy (>1.8.x), numpy.meshgrid() provides a much faster implementation:

@pv's solution

In [113]:%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))10000 loops, best of 3: 135 µs per loopIn [114]:cartesian(([1, 2, 3], [4, 5], [6, 7]))Out[114]:array([[1, 4, 6],       [1, 4, 7],       [1, 5, 6],       [1, 5, 7],       [2, 4, 6],       [2, 4, 7],       [2, 5, 6],       [2, 5, 7],       [3, 4, 6],       [3, 4, 7],       [3, 5, 6],       [3, 5, 7]])

numpy.meshgrid() use to be 2D only, now it is capable of ND. In this case, 3D:

In [115]:%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)10000 loops, best of 3: 74.1 µs per loopIn [116]:np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)Out[116]:array([[1, 4, 6],       [1, 5, 6],       [2, 4, 6],       [2, 5, 6],       [3, 4, 6],       [3, 5, 6],       [1, 4, 7],       [1, 5, 7],       [2, 4, 7],       [2, 5, 7],       [3, 4, 7],       [3, 5, 7]])

Note that the order of the final resultant is slightly different.


itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements WITHOUT repetitions and independent of order; that's not what your code appears to be doing, but I can't tell whether that's because your code is buggy or because you're using the wrong terminology).

If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):    print F(val)

All of these iterators yield tuples, not lists or numpy arrays, so if your F is picky about getting specifically a numpy array you'll have to accept the extra overhead of constructing or clearing and re-filling one at each step.