Python Pandas drop columns based on max value of column
Use the df.max()
to index with.
In [19]: from pandas import DataFrameIn [23]: df = DataFrame(np.random.randn(3,3), columns=['a','b','c'])In [36]: dfOut[36]: a b c0 -0.928912 0.220573 1.9480651 -0.310504 0.847638 -0.5414962 -0.743000 -1.099226 -1.183567In [24]: df.max()Out[24]: a -0.310504b 0.847638c 1.948065dtype: float64
Next, we make a boolean expression out of this:
In [31]: df.max() > 0Out[31]: a Falseb Truec Truedtype: bool
Next, you can index df.columns by this (this is called boolean indexing):
In [34]: df.columns[df.max() > 0]Out[34]: Index([u'b', u'c'], dtype='object')
Which you can finally pass to DF:
In [35]: df[df.columns[df.max() > 0]]Out[35]: b c0 0.220573 1.9480651 0.847638 -0.5414962 -1.099226 -1.183567
Of course, instead of 0, you use any value that you want as the cutoff for dropping.