Numpy sort ndarray on multiple columns Numpy sort ndarray on multiple columns numpy numpy

Numpy sort ndarray on multiple columns


numpy ndarray sort by the 1st, 2nd or 3rd column:

>>> a = np.array([[1,30,200], [2,20,300], [3,10,100]])>>> aarray([[  1,  30, 200],                [  2,  20, 300],                 [  3,  10, 100]])>>> a[a[:,2].argsort()]           #sort by the 3rd column ascendingarray([[  3,  10, 100],       [  1,  30, 200],       [  2,  20, 300]])>>> a[a[:,2].argsort()][::-1]     #sort by the 3rd column descendingarray([[  2,  20, 300],       [  1,  30, 200],       [  3,  10, 100]])>>> a[a[:,1].argsort()]        #sort by the 2nd column ascendingarray([[  3,  10, 100],       [  2,  20, 300],       [  1,  30, 200]])

To explain what is going on here: argsort() is passing back an array containing integer sequence of its parent:https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

>>> x = np.array([15, 30, 4, 80, 6])>>> np.argsort(x)array([2, 4, 0, 1, 3])

Sort by column 3, then by column 2 then 1:

>>> a = np.array([[2,30,200], [1,30,200], [1,10,200]])>>> aarray([[  2,  30, 200],       [  1,  30, 200],       [  1,  10, 200]])>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))]array([[  1,  10, 200],       [  1,  30, 200],       [  2,  30, 200]])>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))][::-1]        #reversearray([[  2  30 200]       [  1  30 200]       [  1  10 200]])


Import letting Numpy guess the type and sorting in place:

import numpy as np# let numpy guess the type with dtype=Nonemy_data = np.genfromtxt(infile, dtype=None, names=["a", "b", "c", "d"])# access columns by nameprint(my_data["b"]) # column 1# sort column 1 and column 0 my_data.sort(order=["b", "a"])# save specifying required format (tab separated values)np.savetxt("sorted.tsv", my_data, fmt="%d\t%d\t%.6f\t%.6f"

Alternatively, specifying the input format and sorting to a new array:

import numpy as np# tell numpy the first 2 columns are int and the last 2 are floatsmy_data = np.genfromtxt(infile, dtype=[('a', '<i8'), ('b', '<i8'), ('x', '<f8'), ('d', '<f8')])# access columns by nameprint(my_data["b"]) # column 1# get the indices to sort the array using lexsort# the last element of the tuple (column 1) is used as the primary keyind = np.lexsort((my_data["a"], my_data["b"]))# create a new, sorted arraysorted_data = my_data[ind]# save specifying required format (tab separated values)np.savetxt("sorted.tsv", sorted_data, fmt="%d\t%d\t%.6f\t%.6f")

Output:

2   1   2.000000    0.0000003   1   2.000000    0.0000004   1   2.000000    0.0000002   2   100.000000  0.0000003   2   4.000000    0.0000004   2   4.000000    0.0000002   3   100.000000  0.0000003   3   6.000000    0.0000004   3   6.000000    0.000000


With np.lexsort you can sort based on several columns simultaneously. The columns that you want to sort by need to be passed in reverse. That means np.lexsort((col_b,col_a)) first sorts by col_a, and then by col_b:

my_data = np.array([[   2.,    1.,    2.,    0.],                    [   2.,    2.,  100.,    0.],                    [   2.,    3.,  100.,    0.],                    [   3.,    1.,    2.,    0.],                    [   3.,    2.,    4.,    0.],                    [   3.,    3.,    6.,    0.],                    [   4.,    1.,    2.,    0.],                    [   4.,    2.,    4.,    0.],                    [   4.,    3.,    6.,    0.]])ind = np.lexsort((my_data[:,0],my_data[:,1]))my_data[ind]

result:

array([[  2.,   1.,   2.,   0.],       [  3.,   1.,   2.,   0.],       [  4.,   1.,   2.,   0.],       [  2.,   2., 100.,   0.],       [  3.,   2.,   4.,   0.],       [  4.,   2.,   4.,   0.],       [  2.,   3., 100.,   0.],       [  3.,   3.,   6.,   0.],       [  4.,   3.,   6.,   0.]])

If you know that your first column is already sorted, you can use:

ind = my_data[:,1].argsort(kind='stable')my_data[ind]

This makes sure that order is preserved for equal items. The quick sort algorithm that is generally used does not do that, though it is faster.