Pandas Python, select columns based on rows conditions Pandas Python, select columns based on rows conditions pandas pandas

Pandas Python, select columns based on rows conditions


Use gt and any to filter the df:

In [287]:df.ix[:,df.gt(2).any()]Out[287]:          20  1.5901241  2.500397

Here we use ix to select all rows, the first : and the next arg is a boolean mask of the columns that meet the condition:

In [288]:df.gt(2)Out[288]:       0      1      2      30  False  False  False  False1  False  False   True  FalseIn [289]:df.gt(2).any()Out[289]:0    False1    False2     True3    Falsedtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

In [291]:df.iloc[(0,1)]Out[291]:1.3296030000000001In [293]:df.columns[df.iloc[(0,1)]>2]Out[293]:'0'


Use mask created with df > 2 with any and then select columns by ix:

import pandas as pdnp.random.seed(18)df = pd.DataFrame(np.random.randn(2, 4))print(df)          0         1         2         30  0.079428  2.190202 -0.134892  0.1605181  0.442698  0.623391  1.008903  0.394249print ((df>2).any())0    False1     True2    False3    Falsedtype: boolprint (df.ix[:, (df>2).any()])          10  2.1902021  0.623391

EDIT by comment:

You can check your solution per partes:

It seems it works, but it always select second column (1, python count from 0) column if condition True:

print (df.iloc[(0,1)])2.19020235741print (df.iloc[(0,1)] > 2)Trueprint (df.columns[df.iloc[(0,1)]>2])1print (df[df.columns[df.iloc[(0,1)]>2]])0    2.1902021    0.623391Name: 1, dtype: float64

And first column (0) column if False, because boolean True and False are casted to 1 and 0:

np.random.seed(15)df = pd.DataFrame(np.random.randn(2, 4))print (df)          0         1         2         30 -0.312328  0.339285 -0.155909 -0.5017901  0.235569 -1.763605 -1.095862 -1.087766print (df.iloc[(0,1)])0.339284706046print (df.iloc[(0,1)] > 2)Falseprint (df.columns[df.iloc[(0,1)]>2])0print (df[df.columns[df.iloc[(0,1)]>2]])0   -0.3123281    0.235569Name: 0, dtype: float64

If change column names:

np.random.seed(15)df = pd.DataFrame(np.random.randn(2, 4))df.columns = ['a','b','c','d']print (df)          a         b         c         d0 -0.312328  0.339285 -0.155909 -0.5017901  0.235569 -1.763605 -1.095862 -1.087766print (df.iloc[(0,1)] > 2)Falseprint (df[df.columns[df.iloc[(0,1)]>2]])0   -0.3123281    0.235569Name: a, dtype: float64


Quick update, as .ix is now deprecated (since 0.20.0). For lastest versions of pandas, .loc will do the trick:

df.loc[:, df.gt(2).any()]