Numpy sort ndarray on multiple columns

numpy ndarray sort by the 1st, 2nd or 3rd column:

>>> a = np.array([[1,30,200], [2,20,300], [3,10,100]])>>> aarray([[  1,  30, 200],                [  2,  20, 300],                 [  3,  10, 100]])>>> a[a[:,2].argsort()]           #sort by the 3rd column ascendingarray([[  3,  10, 100],       [  1,  30, 200],       [  2,  20, 300]])>>> a[a[:,2].argsort()][::-1]     #sort by the 3rd column descendingarray([[  2,  20, 300],       [  1,  30, 200],       [  3,  10, 100]])>>> a[a[:,1].argsort()]        #sort by the 2nd column ascendingarray([[  3,  10, 100],       [  2,  20, 300],       [  1,  30, 200]])

To explain what is going on here: argsort() is passing back an array containing integer sequence of its parent:https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

>>> x = np.array([15, 30, 4, 80, 6])>>> np.argsort(x)array([2, 4, 0, 1, 3])

Sort by column 3, then by column 2 then 1:

>>> a = np.array([[2,30,200], [1,30,200], [1,10,200]])>>> aarray([[  2,  30, 200],       [  1,  30, 200],       [  1,  10, 200]])>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))]array([[  1,  10, 200],       [  1,  30, 200],       [  2,  30, 200]])>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))][::-1]        #reversearray([[  2  30 200]       [  1  30 200]       [  1  10 200]])

python arrays sorting numpy

Import letting Numpy guess the type and sorting in place:

import numpy as np# let numpy guess the type with dtype=Nonemy_data = np.genfromtxt(infile, dtype=None, names=["a", "b", "c", "d"])# access columns by nameprint(my_data["b"]) # column 1# sort column 1 and column 0 my_data.sort(order=["b", "a"])# save specifying required format (tab separated values)np.savetxt("sorted.tsv", my_data, fmt="%d\t%d\t%.6f\t%.6f"

Alternatively, specifying the input format and sorting to a new array:

import numpy as np# tell numpy the first 2 columns are int and the last 2 are floatsmy_data = np.genfromtxt(infile, dtype=[('a', '<i8'), ('b', '<i8'), ('x', '<f8'), ('d', '<f8')])# access columns by nameprint(my_data["b"]) # column 1# get the indices to sort the array using lexsort# the last element of the tuple (column 1) is used as the primary keyind = np.lexsort((my_data["a"], my_data["b"]))# create a new, sorted arraysorted_data = my_data[ind]# save specifying required format (tab separated values)np.savetxt("sorted.tsv", sorted_data, fmt="%d\t%d\t%.6f\t%.6f")

Output:

2   1   2.000000    0.0000003   1   2.000000    0.0000004   1   2.000000    0.0000002   2   100.000000  0.0000003   2   4.000000    0.0000004   2   4.000000    0.0000002   3   100.000000  0.0000003   3   6.000000    0.0000004   3   6.000000    0.000000

python arrays sorting numpy

With np.lexsort you can sort based on several columns simultaneously. The columns that you want to sort by need to be passed in reverse. That means np.lexsort((col_b,col_a)) first sorts by col_a, and then by col_b:

my_data = np.array([[   2.,    1.,    2.,    0.],                    [   2.,    2.,  100.,    0.],                    [   2.,    3.,  100.,    0.],                    [   3.,    1.,    2.,    0.],                    [   3.,    2.,    4.,    0.],                    [   3.,    3.,    6.,    0.],                    [   4.,    1.,    2.,    0.],                    [   4.,    2.,    4.,    0.],                    [   4.,    3.,    6.,    0.]])ind = np.lexsort((my_data[:,0],my_data[:,1]))my_data[ind]

result:

array([[  2.,   1.,   2.,   0.],       [  3.,   1.,   2.,   0.],       [  4.,   1.,   2.,   0.],       [  2.,   2., 100.,   0.],       [  3.,   2.,   4.,   0.],       [  4.,   2.,   4.,   0.],       [  2.,   3., 100.,   0.],       [  3.,   3.,   6.,   0.],       [  4.,   3.,   6.,   0.]])

If you know that your first column is already sorted, you can use:

ind = my_data[:,1].argsort(kind='stable')my_data[ind]

This makes sure that order is preserved for equal items. The quick sort algorithm that is generally used does not do that, though it is faster.

CodeHunter

Numpy sort ndarray on multiple columns

numpy ndarray sort by the 1st, 2nd or 3rd column:

Sort by column 3, then by column 2 then 1:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last