Understanding the syntax of numpy.r_() concatenation Understanding the syntax of numpy.r_() concatenation numpy numpy

Understanding the syntax of numpy.r_() concatenation


'n,m' tells r_ to concatenate along axis=n, and produce a shape with at least m dimensions:

In [28]: np.r_['0,2', [1,2,3], [4,5,6]]Out[28]: array([[1, 2, 3],       [4, 5, 6]])

So we are concatenating along axis=0, and we would normally therefore expect the result to have shape (6,), but since m=2, we are telling r_ that the shape must be at least 2-dimensional. So instead we get shape (2,3):

In [32]: np.r_['0,2', [1,2,3,], [4,5,6]].shapeOut[32]: (2, 3)

Look at what happens when we increase m:

In [36]: np.r_['0,3', [1,2,3,], [4,5,6]].shapeOut[36]: (2, 1, 3)    # <- 3 dimensionsIn [37]: np.r_['0,4', [1,2,3,], [4,5,6]].shapeOut[37]: (2, 1, 1, 3) # <- 4 dimensions

Anything you can do with r_ can also be done with one of the more readable array-building functions such as np.concatenate, np.row_stack, np.column_stack, np.hstack, np.vstack or np.dstack, though it may also require a call to reshape.

Even with the call to reshape, those other functions may even be faster:

In [38]: %timeit np.r_['0,4', [1,2,3,], [4,5,6]]10000 loops, best of 3: 38 us per loopIn [43]: %timeit np.concatenate(([1,2,3,], [4,5,6])).reshape(2,1,1,3)100000 loops, best of 3: 10.2 us per loop


The string '0,2' tells numpy to concatenate along axis 0 (the first axis) and to wrap the elements in enough brackets to ensure a two-dimensional array. Consider the following results:

for axis in (0,1):    for minDim in (1,2,3):        print np.r_['{},{}'.format(axis, minDim), [1,2,30, 31], [4,5,6, 61], [7,8,90, 91], [10,11, 12, 13]], 'axis={}, minDim={}\n'.format(axis, minDim)[ 1  2 30 31  4  5  6 61  7  8 90 91 10 11 12 13] axis=0, minDim=1[[ 1  2 30 31] [ 4  5  6 61] [ 7  8 90 91] [10 11 12 13]] axis=0, minDim=2[[[ 1  2 30 31]] [[ 4  5  6 61]] [[ 7  8 90 91]] [[10 11 12 13]]] axis=0, minDim=3[ 1  2 30 31  4  5  6 61  7  8 90 91 10 11 12 13] axis=1, minDim=1[[ 1  2 30 31  4  5  6 61  7  8 90 91 10 11 12 13]] axis=1, minDim=2[[[ 1  2 30 31]  [ 4  5  6 61]  [ 7  8 90 91]  [10 11 12 13]]] axis=1, minDim=3


The paragraph that you've highlighted is the two comma-separated integers syntax which is a special case of the three comma-separated syntax. Once you understand the three comma-separated syntax the two comma-separated syntax falls into place.

The equivalent three comma-separated integers syntax for your example would be:

np.r_['0,2,-1', [1,2,3], [4,5,6]]

In order to provide a better explanation I will change the above to:

np.r_['0,2,-1', [1,2,3], [[4,5,6]]]

The above has two parts:

  1. A comma-separated integer string

  2. Two comma-separated arrays

The comma-separated arrays have the following shapes:

np.array([1,2,3]).shape(3,)np.array([[4,5,6]]).shape(1, 3)

In other words the first 'array' is '1-dimensional' while the second 'array' is '2-dimensional'.

First the 2 in 0,2,-1 means that each array should be upgraded so that it's forced to be at least 2-dimensional. Since the second array is already 2-dimensional it is not affected. However the first array is 1-dimensional and in order to make it 2-dimensional np.r_ needs to add a 1 to its shape tuple to make it either (1,3) or (3,1). That is where the -1 in 0,2,-1 comes into play. It basically decides where the extra 1 needs to be placed in the shape tuple of the array. -1 is the default and places the 1 (or 1s if more dimensions are required) in the front of the shape tuple (I explain why further below). This turns the first array's shape tuple into (1,3) which is the same as the second array's shape tuple. The 0 in 0,2,-1 means that the resulting arrays need to be concatenated along the '0' axis.

Since both arrays now have a shape tuple of (1,3) concatenation is possible because if you set aside the concatenation axis (dimension 0 in the above example which has a value of 1) in both arrays the remaining dimensions are equal (in this case the value of the remaining dimension in both arrays is 3). If this was not the case then the following error would be produced:

ValueError: all the input array dimensions except for the concatenation axis must match exactly

Now if you concatenate two arrays having the shape (1,3) the resulting array will have shape (1+1,3) == (2,3) and therefore:

np.r_['0,2,-1', [1,2,3], [[4,5,6]]].shape(2, 3)

When a 0 or a positive integer is used for the third integer in the comma-separated string, that integer determines the start of each array's shape tuple in the upgraded shape tuple (only for those arrays which need to have their dimensions upgraded). For example 0,2,0 means that for arrays requiring a shape upgrade the array's original shape tuple should start at dimension 0 of the upgraded shape tuple. For array [1,2,3] which has a shape tuple (3,) the 1 would be placed after the 3. This would result in a shape tuple equal to (3,1) and as you can see the original shape tuple (3,) starts at dimension 0 of the upgraded shape tuple. 0,2,1 would mean that for [1,2,3] the array's shape tuple (3,) should start at dimension 1 of the upgraded shape tuple. This means that the 1 needs to be placed at dimension 0. The resulting shape tuple would be (1,3).

When a negative number is used for the third integer in the comma-separated string, the integer following the negative sign determines where original shape tuple should end. When the original shape tuple is (3,) 0,2,-1 means that the original shape tuple should end at the last dimension of the upgraded shape tuple and therefore the 1 would be placed at dimension 0 of the upgraded shape tuple and the upgraded shape tuple would be (1,3). Now (3,) ends at dimension 1 of the upgraded shape tuple which is also the last dimension of the upgraded shape tuple ( original array is [1,2,3] and upgraded array is [[1,2,3]]).

np.r_['0,2', [1,2,3], [4,5,6]]

Is the same as

np.r_['0,2,-1', [1,2,3], [4,5,6]]

Finally here's an example with more dimensions:

np.r_['2,4,1',[[1,2],[4,5],[10,11]],[7,8,9]].shape(1, 3, 3, 1)

The comma-separated arrays are:

[[1,2],[4,5],[10,11]] which has shape tuple (3,2)

[7,8,9] which has shape tuple (3,)

Both of the arrays need to be upgraded to 4-dimensional arrays. The original array's shape tuples need to start from dimension 1.

Therefore for the first array the shape becomes (1,3,2,1) as 3,2 starts at dimension 1 and because two 1s need to be added to make it 4-dimensional one 1 is placed before the original shape tuple and one 1 after.

Using the same logic the second array's shape tuple becomes (1,3,1,1).

Now the two arrays need to be concatenated using dimension 2 as the concatenation axis. Eliminating dimension 2 from each array's upgraded shape tuple result in the tuple (1,3,1) for both arrays. As the resulting tuples are identical the arrays can be concatenated and the concatenated axis are summed up to produce (1, 3, 2+1, 1) == (1, 3, 3, 1).