How to read data when some numbers contain commas as thousand separator? How to read data when some numbers contain commas as thousand separator? r r

How to read data when some numbers contain commas as thousand separator?


Not sure about how to have read.csv interpret it properly, but you can use gsub to replace "," with "", and then convert the string to numeric using as.numeric:

y <- c("1,200","20,000","100","12,111")as.numeric(gsub(",", "", y))# [1]  1200 20000 100 12111

This was also answered previously on R-Help (and in Q2 here).

Alternatively, you can pre-process the file, for instance with sed in unix.


You can have read.table or read.csv do this conversion for you semi-automatically. First create a new class definition, then create a conversion function and set it as an "as" method using the setAs function like so:

setClass("num.with.commas")setAs("character", "num.with.commas",         function(from) as.numeric(gsub(",", "", from) ) )

Then run read.csv like:

DF <- read.csv('your.file.here',    colClasses=c('num.with.commas','factor','character','numeric','num.with.commas'))


I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub, I think this is about as neat as I can do:

x <- read.csv("file.csv",header=TRUE,colClasses="character")col2cvt <- 15:41x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})