Reading Rdata file with different encoding Reading Rdata file with different encoding linux linux

Reading Rdata file with different encoding


Thanks to 42's comment, I've managed to write a function to recode the file:

fix.encoding <- function(df, originalEncoding = "latin1") {  numCols <- ncol(df)  for (col in 1:numCols) Encoding(df[, col]) <- originalEncoding  return(df)}

The meat here is the command Encoding(df[, col]) <- "latin1", which takes column col of dataframe df and converts it to latin1 format. Unfortunately, Encoding only takes column objects as input, so I had to create a function to sweep all columns of a dataframe object and apply the transformation.

Of course, if your problem is in just a couple of columns, you're better off just applying the Encoding to those columns instead of the whole dataframe (you can modify the function above to take a set of columns as input). Also, if you're facing the inverse problem, i.e. reading an R object created in Linux or Mac OS into Windows, you should use originalEncoding = "UTF-8".


following up on previous answers, this is a minor update which makes it work on factors and dplyr's tibble. Thanks for inspiration.

fix.encoding <- function(df, originalEncoding = "UTF-8") {numCols <- ncol(df)df <- data.frame(df)for (col in 1:numCols){        if(class(df[, col]) == "character"){                Encoding(df[, col]) <- originalEncoding        }        if(class(df[, col]) == "factor"){                        Encoding(levels(df[, col])) <- originalEncoding}}return(as_data_frame(df))}


Thank you for posting this. I took the liberty to modify your function in case you have a dataframe with some columns as character and some as non-character. Otherwise, an error occurs:

> fix.encoding(adress)Error in `Encoding<-`(`*tmp*`, value = "latin1") : a character vector argument expected

So here is the modified function:

fix.encoding <- function(df, originalEncoding = "latin1") {    numCols <- ncol(df)    for (col in 1:numCols)            if(class(df[, col]) == "character"){                    Encoding(df[, col]) <- originalEncoding            }    return(df)}

However, this will not change the encoding of level's names in a "factor" column. Luckily, I found this to change all factors in your dataframe to character (which may be not the best approach, but in my case that's what I needed):

i <- sapply(df, is.factor)df[i] <- lapply(df[i], as.character)