Pandas Python, select columns based on rows conditions

python pandas dataframe conditional-statements

Use gt and any to filter the df:

In [287]:df.ix[:,df.gt(2).any()]Out[287]:          20  1.5901241  2.500397

Here we use ix to select all rows, the first : and the next arg is a boolean mask of the columns that meet the condition:

In [288]:df.gt(2)Out[288]:       0      1      2      30  False  False  False  False1  False  False   True  FalseIn [289]:df.gt(2).any()Out[289]:0    False1    False2     True3    Falsedtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

In [291]:df.iloc[(0,1)]Out[291]:1.3296030000000001In [293]:df.columns[df.iloc[(0,1)]>2]Out[293]:'0'

python pandas dataframe conditional-statements

Use mask created with df > 2 with any and then select columns by ix:

import pandas as pdnp.random.seed(18)df = pd.DataFrame(np.random.randn(2, 4))print(df)          0         1         2         30  0.079428  2.190202 -0.134892  0.1605181  0.442698  0.623391  1.008903  0.394249print ((df>2).any())0    False1     True2    False3    Falsedtype: boolprint (df.ix[:, (df>2).any()])          10  2.1902021  0.623391

EDIT by comment:

You can check your solution per partes:

It seems it works, but it always select second column (1, python count from 0) column if condition True:

print (df.iloc[(0,1)])2.19020235741print (df.iloc[(0,1)] > 2)Trueprint (df.columns[df.iloc[(0,1)]>2])1print (df[df.columns[df.iloc[(0,1)]>2]])0    2.1902021    0.623391Name: 1, dtype: float64

And first column (0) column if False, because boolean True and False are casted to 1 and 0:

np.random.seed(15)df = pd.DataFrame(np.random.randn(2, 4))print (df)          0         1         2         30 -0.312328  0.339285 -0.155909 -0.5017901  0.235569 -1.763605 -1.095862 -1.087766print (df.iloc[(0,1)])0.339284706046print (df.iloc[(0,1)] > 2)Falseprint (df.columns[df.iloc[(0,1)]>2])0print (df[df.columns[df.iloc[(0,1)]>2]])0   -0.3123281    0.235569Name: 0, dtype: float64

If change column names:

np.random.seed(15)df = pd.DataFrame(np.random.randn(2, 4))df.columns = ['a','b','c','d']print (df)          a         b         c         d0 -0.312328  0.339285 -0.155909 -0.5017901  0.235569 -1.763605 -1.095862 -1.087766print (df.iloc[(0,1)] > 2)Falseprint (df[df.columns[df.iloc[(0,1)]>2]])0   -0.3123281    0.235569Name: a, dtype: float64

python pandas dataframe conditional-statements

Quick update, as .ix is now deprecated (since 0.20.0). For lastest versions of pandas, .loc will do the trick:

df.loc[:, df.gt(2).any()]

CodeHunter

Pandas Python, select columns based on rows conditions

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last