How to drop columns which have same values in all rows via pandas or spark dataframe? How to drop columns which have same values in all rows via pandas or spark dataframe? python python

How to drop columns which have same values in all rows via pandas or spark dataframe?


What we can do is use nunique to calculate the number of unique values in each column of the dataframe, and drop the columns which only have a single unique value:

In [285]:nunique = df.nunique()cols_to_drop = nunique[nunique == 1].indexdf.drop(cols_to_drop, axis=1)Out[285]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

Another way is to just diff the numeric columns, take abs values and sums them:

In [298]:cols = df.select_dtypes([np.number]).columnsdiff = df[cols].diff().abs().sum()df.drop(diff[diff== 0].index, axis=1)​Out[298]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

Another approach is to use the property that the standard deviation will be zero for a column with the same value:

In [300]:cols = df.select_dtypes([np.number]).columnsstd = df[cols].std()cols_to_drop = std[std==0].indexdf.drop(cols_to_drop, axis=1)Out[300]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

Actually the above can be done in a one-liner:

In [306]:df.drop(df.std()[(df.std() == 0)].index, axis=1)Out[306]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7


A simple one liner(python):

df=df[[i for i in df if len(set(df[i]))>1]]


Another solution is set_index from column which are not compared and then compare first row selected by iloc by eq with all DataFrame and last use boolean indexing:

df1 = df.set_index(['index','id','name',])print (~df1.eq(df1.iloc[0]).all())value     Falsevalue2    Falsevalue3    Falsedata1      Trueval5      Falsedtype: boolprint (df1.ix[:, (~df1.eq(df1.iloc[0]).all())].reset_index())   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7