Write UTF-8 files from R
This "answer" serves rather the purpose of clarifying that there is something odd going on behind the scenes:
"hīersumian" doesn't even make it into the data frame it seems. The "ī"-symbol is in all cases converted to "i".
options("encoding" = "native.enc")t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)t1# a# 1 hiersumian options("encoding" = "UTF-8")t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)t1# a# 1 hiersumian options("encoding" = "UTF-16")t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)t1# a# 1 hiersumian
The following sequence successfully writes "ǣmettigan" to the text file:
t2 <- data.frame(a = c("ǣmettigan"), stringsAsFactors=F)getOption("encoding")# [1] "native.enc"Encoding(t2[,"a"]) <- "UTF-16"write.table(t2,"test.txt",row.names=F,col.names=F,quote=F)
It is not going to work with "encoding" as "UTF-8" or "UTF-16" and also specifying "fileEncoding" will either lead to a defect or no output.
Somewhat disappointing as so far I managed to get all Unicode issues fixed somehow.
I may be missing something OS-specific, but data.table
appears to have no problem with this (or perhaps more likely it's an update to R internals since this question was originally posed):
t1 = data.table(a = c("hīersumian", "ǣmettigan"))tmp = tempfile()fwrite(t1, tmp)system(paste('cat', tmp))# a# hīersumian# ǣmettiganfread(tmp)# a# 1: hīersumian# 2: ǣmettigan
I found a blog post that basically says its windows way of encoding text. Lots more detail in post. User should write the file in binary using
writeBin(charToRaw(x), con, endian="little")
https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/