Duplicate columns in Spark Dataframe

The best way would be to change the column name upstream ;)

However, it seems that is not possible, so there are a couple of options:

If the case of the columns are different("email" vs "Email") you can turn on case sensitivity:
```
     sql(sqlContext, "set spark.sql.caseSensitive=true")
```

If the column names are exactly the same, you will need to manually specify the schema and skip the first row to avoid the headers:

customSchema <- structType(structField("year", "integer"), structField("make", "string"),structField("model", "string"),structField("comment", "string"),structField("blank", "string"))df <- read.df(sqlContext, "cars.csv", source = "com.databricks.spark.csv", header="true", schema = customSchema)

r csv hadoop apache-spark sparkr

Try renaming the column.

You can select it by position instead of the select call.

colnames(df)[column number of interest] <- 'deleteme'

Alternatively you could just drop the column directly

 newdf <- df[,-x]

Where x is the column number you don't want.

Update:

If the above don't work, you could set header to false and then use the first row to rename columns:

  df <- read.df(    sqlContext,    FILE_PATH,    source = "com.databricks.spark.csv",    header = "FALSE",    mode = "DROPMALFORMED"  )#get first row to use as column namesmycolnames <- df[1,]#edit the dup column *in situ*mycolnames[x] <- 'IamNotADup'colnames(df) <- df[1,]# drop the first row:df <- df[-1,]

r csv hadoop apache-spark sparkr

You can also create a new dataframe using toDF.

Here's the same thing, for pyspark: Selecting or removing duplicate columns from spark dataframe

CodeHunter

Duplicate columns in Spark Dataframe

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last