Blank space not recognised as NA in fread
In case you want to avoid the additional manipulation after reading the file, you could try using
quote = FALSE
when writing to csv. This prevents the use of quotations " "
around the values and all missing values should now be read as NA
s. It should look like this -
# also turned off row names to prevent an additional column when reading the file.write.csv(df, "tr.csv", quote = FALSE, row.names = FALSE)
Output -
tr1 <- fread("tr.csv", header=T, fill = T, sep= ",", na.strings = c("",NA), data.table = F, stringsAsFactors = FALSE)tr1 x1 x2 x3 x41 NA 1006678566 <NA> NA2 NA NA ac 23 NA 1011160152 <NA> 3tr2 <- read.table("tr.csv", fill = TRUE, header=T, sep= ",", na.strings = c(""," ", NA), stringsAsFactors = FALSE)tr2 x1 x2 x3 x41 NA 1006678566 <NA> NA2 NA NA ac 23 NA 1011160152 <NA> 3
One thing that I found was the way data gets saved when we do a write.csv().
Open the csv file and hit delete for blank cells in X4 and save . If you import it now, the NA would show up in R.
To check:
apply(tr1, 2, function(x) length(which(is.na(x))))
V1 x1 x2 x3 x4
0 3 1 2 1
If there is a csv file with blanks and we do fread using
na.strings("", NA)
The character data types also show up as "NA" for blanks.
@SJB Use na.strings = c(NA_character_, "")
as argument in fread()
and blank spaces/cells will be read as NA.
There are forms of NA for various data types. See help(NA)
:NA_character_NA_real_NA_integer_ etc.