Why is R reading UTF-8 header as text? Why is R reading UTF-8 header as text? r r

Why is R reading UTF-8 header as text?


So I was going to give you instructions on how to manually open the file and check for and discard the BOM, but then I noticed this (in ?file):

As from R 3.0.0 the encoding "UTF-8-BOM" is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).

which means that if you have a sufficiently new R interpreter,

read.csv("my_file.txt", fileEncoding="UTF-8-BOM", ...other args...)

should do what you want.


most of the arguments in read.csv are dummy args -- including fileEncoding.

use read.table instead

 read.table("my_file.txt", header=TRUE, sep="\t", fileEncoding="UTF-8")


I had the same issue loading a csv file using either read.csv (with encoding="UTF-87-BOM"), read.table or read_csv from the readr package. None of these attempt proved successful.

I could definitely not work with the BOM tag because upon sub setting my data (using both approaches subset() or df[df$var=="value",]), the first row was not taken into account.

I finally found a workaround that made the BOM tag vanish. Using the read.csv function, I just defined a string vector for my column names in the argument col.names = ... . This works like a charm and I can subset my data without issues.

I use R Version 3.5.0