Finding n lowest values for each row in a dataframe Finding n lowest values for each row in a dataframe pandas pandas

Finding n lowest values for each row in a dataframe


Use .argsort to get the indices of the underlying array sorted. Slice the values and the column Index to get all of the information you need. We'll create a MultiIndex so we can store both the column headers and values in the same DataFrame. The first level will be your nth lowest indicator.

Example:

import pandas as pdimport numpy as npnp.random.seed(1)df = pd.DataFrame(np.random.randint(1,100000, (1739, 26)))df.columns = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')N = 7  # 150 in your caseidx = np.argsort(df.values, 1)[:, 0:N]pd.concat([pd.DataFrame(np.take_along_axis(df.to_numpy(), idx, axis=1), index=df.index),           pd.DataFrame(df.columns.to_numpy(), index=df.index)],           keys=['Value', 'Columns'], axis=1)

Output:

      Value                                           Columns                            0      1      2      3      4      5      6       0  1  2  3  4  5  60      5193   7752   8445  19947  20610  21441  21759       C  K  U  V  I  G  P1       432   3607  16278  17138  19434  26104  33879       R  J  W  C  B  D  G2        16   1047   1845   9553  12314  13784  19432       K  S  E  F  M  O  U3       244   5272  10836  13682  29237  33230  34448       K  Q  A  S  X  W  G4      9765  11275  13160  22808  30870  33484  42760       K  T  L  U  C  D  M5      2034   2179   4980   7184  14826  15238  22807       Z  H  F  Q  L  R  X...


You can use heapq.nsmallest to find the n smallest numbers in a list. This can be quickly applied to each row of a dataframe using .apply:

import pandas as pdimport numpy as npimport heapqdf = pd.DataFrame(np.random.randn(1000, 1000))# Find the 150 smallest values in each rowsmallest = df.apply(lambda x: heapq.nsmallest(150, x), axis=1)

Each row of smallest is now a list of the 150 smallest values in the corresponding row in df.

This can be converted to a dataframe using:

smallest_df = pd.DataFrame(smallest.values.tolist())

This is now a dataframe where each row corresponds to each row in the original dataframe. There are 150 columns, with the 150 smallest values in each row of the original.

smallest_df.head()

smallest_df


If I understand correctly, the question boils down to getting the k smallest numbers in a list of M (>k) numbers. This shall then be applied to each row individually.

If numpy is available and order does not matter, you could try using argpartition: With given parameter k, it partitions an array in a way that assuming the kth element is put into its sorted position, all smaller numbers are before, all larger numbers behind (in unspecified order):

import numpy as nprow = np.array([1, 6, 2, 12, 7, 8, 9, 11, 15, 26])k = 5idx = np.argpartition(row, k)[:k]print(idx)print(row[idx])-->[1 0 2 4 5][6 1 2 7 8]

Edit: This also works row/wise for full arrays:

import numpy as npdata = np.array([    [1, 6, 2, 12, 7, 8, 9, 11, 15, 26],    [1, 65, 2, 12, 7, 8, 9, 11, 15, 26],    [16, 6, 2, 12, 7, 8, 9, 11, 15, 26]])k = 5idx = np.argpartition(data, k)[:,:k]print(idx)-->[[1 0 2 4 5] [2 0 4 5 6] [4 2 1 5 6]]