Finding n lowest values for each row in a dataframe

python pandas min

Use .argsort to get the indices of the underlying array sorted. Slice the values and the column Index to get all of the information you need. We'll create a MultiIndex so we can store both the column headers and values in the same DataFrame. The first level will be your nth lowest indicator.

Example:

import pandas as pdimport numpy as npnp.random.seed(1)df = pd.DataFrame(np.random.randint(1,100000, (1739, 26)))df.columns = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')N = 7  # 150 in your caseidx = np.argsort(df.values, 1)[:, 0:N]pd.concat([pd.DataFrame(np.take_along_axis(df.to_numpy(), idx, axis=1), index=df.index),           pd.DataFrame(df.columns.to_numpy(), index=df.index)],           keys=['Value', 'Columns'], axis=1)

Output:

      Value                                           Columns                            0      1      2      3      4      5      6       0  1  2  3  4  5  60      5193   7752   8445  19947  20610  21441  21759       C  K  U  V  I  G  P1       432   3607  16278  17138  19434  26104  33879       R  J  W  C  B  D  G2        16   1047   1845   9553  12314  13784  19432       K  S  E  F  M  O  U3       244   5272  10836  13682  29237  33230  34448       K  Q  A  S  X  W  G4      9765  11275  13160  22808  30870  33484  42760       K  T  L  U  C  D  M5      2034   2179   4980   7184  14826  15238  22807       Z  H  F  Q  L  R  X...

python pandas min

You can use heapq.nsmallest to find the n smallest numbers in a list. This can be quickly applied to each row of a dataframe using .apply:

import pandas as pdimport numpy as npimport heapqdf = pd.DataFrame(np.random.randn(1000, 1000))# Find the 150 smallest values in each rowsmallest = df.apply(lambda x: heapq.nsmallest(150, x), axis=1)

Each row of smallest is now a list of the 150 smallest values in the corresponding row in df.

This can be converted to a dataframe using:

smallest_df = pd.DataFrame(smallest.values.tolist())

This is now a dataframe where each row corresponds to each row in the original dataframe. There are 150 columns, with the 150 smallest values in each row of the original.

smallest_df.head()

python pandas min

If I understand correctly, the question boils down to getting the k smallest numbers in a list of M (>k) numbers. This shall then be applied to each row individually.

If numpy is available and order does not matter, you could try using argpartition: With given parameter k, it partitions an array in a way that assuming the kth element is put into its sorted position, all smaller numbers are before, all larger numbers behind (in unspecified order):

import numpy as nprow = np.array([1, 6, 2, 12, 7, 8, 9, 11, 15, 26])k = 5idx = np.argpartition(row, k)[:k]print(idx)print(row[idx])-->[1 0 2 4 5][6 1 2 7 8]

Edit: This also works row/wise for full arrays:

import numpy as npdata = np.array([    [1, 6, 2, 12, 7, 8, 9, 11, 15, 26],    [1, 65, 2, 12, 7, 8, 9, 11, 15, 26],    [16, 6, 2, 12, 7, 8, 9, 11, 15, 26]])k = 5idx = np.argpartition(data, k)[:,:k]print(idx)-->[[1 0 2 4 5] [2 0 4 5 6] [4 2 1 5 6]]

CodeHunter

Finding n lowest values for each row in a dataframe

Example:

Output:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last