Write UTF-8 files from R Write UTF-8 files from R r r

Write UTF-8 files from R


This "answer" serves rather the purpose of clarifying that there is something odd going on behind the scenes:

"hīersumian" doesn't even make it into the data frame it seems. The "ī"-symbol is in all cases converted to "i".

options("encoding" = "native.enc")t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)t1#             a# 1 hiersumian options("encoding" = "UTF-8")t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)t1#             a# 1 hiersumian options("encoding" = "UTF-16")t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)t1#             a# 1 hiersumian 

The following sequence successfully writes "ǣmettigan" to the text file:

t2 <- data.frame(a = c("ǣmettigan"), stringsAsFactors=F)getOption("encoding")# [1] "native.enc"Encoding(t2[,"a"]) <- "UTF-16"write.table(t2,"test.txt",row.names=F,col.names=F,quote=F)

enter image description here

It is not going to work with "encoding" as "UTF-8" or "UTF-16" and also specifying "fileEncoding" will either lead to a defect or no output.

Somewhat disappointing as so far I managed to get all Unicode issues fixed somehow.


I may be missing something OS-specific, but data.table appears to have no problem with this (or perhaps more likely it's an update to R internals since this question was originally posed):

t1 = data.table(a = c("hīersumian", "ǣmettigan"))tmp = tempfile()fwrite(t1, tmp)system(paste('cat', tmp))# a# hīersumian# ǣmettiganfread(tmp)#             a# 1: hīersumian# 2:  ǣmettigan


I found a blog post that basically says its windows way of encoding text. Lots more detail in post. User should write the file in binary using

writeBin(charToRaw(x), con, endian="little")

https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/