Python Pandas drop columns based on max value of column Python Pandas drop columns based on max value of column numpy numpy

Python Pandas drop columns based on max value of column


Use the df.max() to index with.

In [19]: from pandas import DataFrameIn [23]: df = DataFrame(np.random.randn(3,3), columns=['a','b','c'])In [36]: dfOut[36]:           a         b         c0 -0.928912  0.220573  1.9480651 -0.310504  0.847638 -0.5414962 -0.743000 -1.099226 -1.183567In [24]: df.max()Out[24]: a   -0.310504b    0.847638c    1.948065dtype: float64

Next, we make a boolean expression out of this:

In [31]: df.max() > 0Out[31]: a    Falseb     Truec     Truedtype: bool

Next, you can index df.columns by this (this is called boolean indexing):

In [34]: df.columns[df.max() > 0]Out[34]: Index([u'b', u'c'], dtype='object')

Which you can finally pass to DF:

In [35]: df[df.columns[df.max() > 0]]Out[35]:           b         c0  0.220573  1.9480651  0.847638 -0.5414962 -1.099226 -1.183567

Of course, instead of 0, you use any value that you want as the cutoff for dropping.