How to drop columns which have same values in all rows via pandas or spark dataframe?

python pandas apache-spark-sql duplicates multiple-columns

What we can do is use nunique to calculate the number of unique values in each column of the dataframe, and drop the columns which only have a single unique value:

In [285]:nunique = df.nunique()cols_to_drop = nunique[nunique == 1].indexdf.drop(cols_to_drop, axis=1)Out[285]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

Another way is to just diff the numeric columns, take abs values and sums them:

In [298]:cols = df.select_dtypes([np.number]).columnsdiff = df[cols].diff().abs().sum()df.drop(diff[diff== 0].index, axis=1)Out[298]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

Another approach is to use the property that the standard deviation will be zero for a column with the same value:

In [300]:cols = df.select_dtypes([np.number]).columnsstd = df[cols].std()cols_to_drop = std[std==0].indexdf.drop(cols_to_drop, axis=1)Out[300]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

Actually the above can be done in a one-liner:

In [306]:df.drop(df.std()[(df.std() == 0)].index, axis=1)Out[306]:   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

python pandas apache-spark-sql duplicates multiple-columns

A simple one liner(python):

df=df[[i for i in df if len(set(df[i]))>1]]

python pandas apache-spark-sql duplicates multiple-columns

Another solution is set_index from column which are not compared and then compare first row selected by iloc by eq with all DataFrame and last use boolean indexing:

df1 = df.set_index(['index','id','name',])print (~df1.eq(df1.iloc[0]).all())value     Falsevalue2    Falsevalue3    Falsedata1      Trueval5      Falsedtype: boolprint (df1.ix[:, (~df1.eq(df1.iloc[0]).all())].reset_index())   index   id   name  data10      0  345  name1      31      1   12  name2      22      5    2  name6      7

CodeHunter

How to drop columns which have same values in all rows via pandas or spark dataframe?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last