Better way to shuffle two numpy arrays in unison Better way to shuffle two numpy arrays in unison python python

# Better way to shuffle two numpy arrays in unison

Your can use NumPy's array indexing:

``def unison_shuffled_copies(a, b):    assert len(a) == len(b)    p = numpy.random.permutation(len(a))    return a[p], b[p]``

This will result in creation of separate unison-shuffled arrays.

``X = np.array([[1., 0.], [2., 1.], [0., 0.]])y = np.array([0, 1, 2])from sklearn.utils import shuffleX, y = shuffle(X, y, random_state=0)``

Your "scary" solution does not appear scary to me. Calling `shuffle()` for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to `shuffle()`, so the whole algorithm will generate the same permutation.

If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.

Example: Let's assume the arrays `a` and `b` look like this:

``a = numpy.array([[[  0.,   1.,   2.],                  [  3.,   4.,   5.]],                 [[  6.,   7.,   8.],                  [  9.,  10.,  11.]],                 [[ 12.,  13.,  14.],                  [ 15.,  16.,  17.]]])b = numpy.array([[ 0.,  1.],                 [ 2.,  3.],                 [ 4.,  5.]])``

We can now construct a single array containing all the data:

``c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])``

Now we create views simulating the original `a` and `b`:

``a2 = c[:, :a.size//len(a)].reshape(a.shape)b2 = c[:, a.size//len(a):].reshape(b.shape)``

The data of `a2` and `b2` is shared with `c`. To shuffle both arrays simultaneously, use `numpy.random.shuffle(c)`.

In production code, you would of course try to avoid creating the original `a` and `b` at all and right away create `c`, `a2` and `b2`.

This solution could be adapted to the case that `a` and `b` have different dtypes.