Pandas Python, select columns based on rows conditions
Use gt
and any
to filter the df:
In [287]:df.ix[:,df.gt(2).any()]Out[287]: 20 1.5901241 2.500397
Here we use ix
to select all rows, the first :
and the next arg is a boolean mask of the columns that meet the condition:
In [288]:df.gt(2)Out[288]: 0 1 2 30 False False False False1 False False True FalseIn [289]:df.gt(2).any()Out[289]:0 False1 False2 True3 Falsedtype: bool
In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:
In [291]:df.iloc[(0,1)]Out[291]:1.3296030000000001In [293]:df.columns[df.iloc[(0,1)]>2]Out[293]:'0'
Use mask
created with df > 2
with any
and then select columns by ix
:
import pandas as pdnp.random.seed(18)df = pd.DataFrame(np.random.randn(2, 4))print(df) 0 1 2 30 0.079428 2.190202 -0.134892 0.1605181 0.442698 0.623391 1.008903 0.394249print ((df>2).any())0 False1 True2 False3 Falsedtype: boolprint (df.ix[:, (df>2).any()]) 10 2.1902021 0.623391
EDIT by comment:
You can check your solution per partes:
It seems it works, but it always select second column (1
, python count from 0
) column if condition True
:
print (df.iloc[(0,1)])2.19020235741print (df.iloc[(0,1)] > 2)Trueprint (df.columns[df.iloc[(0,1)]>2])1print (df[df.columns[df.iloc[(0,1)]>2]])0 2.1902021 0.623391Name: 1, dtype: float64
And first column (0
) column if False
, because boolean True
and False
are casted to 1
and 0
:
np.random.seed(15)df = pd.DataFrame(np.random.randn(2, 4))print (df) 0 1 2 30 -0.312328 0.339285 -0.155909 -0.5017901 0.235569 -1.763605 -1.095862 -1.087766print (df.iloc[(0,1)])0.339284706046print (df.iloc[(0,1)] > 2)Falseprint (df.columns[df.iloc[(0,1)]>2])0print (df[df.columns[df.iloc[(0,1)]>2]])0 -0.3123281 0.235569Name: 0, dtype: float64
If change column names:
np.random.seed(15)df = pd.DataFrame(np.random.randn(2, 4))df.columns = ['a','b','c','d']print (df) a b c d0 -0.312328 0.339285 -0.155909 -0.5017901 0.235569 -1.763605 -1.095862 -1.087766print (df.iloc[(0,1)] > 2)Falseprint (df[df.columns[df.iloc[(0,1)]>2]])0 -0.3123281 0.235569Name: a, dtype: float64
Quick update, as .ix
is now deprecated (since 0.20.0
). For lastest versions of pandas, .loc
will do the trick:
df.loc[:, df.gt(2).any()]