Why is R reading UTF-8 header as text?
So I was going to give you instructions on how to manually open the file and check for and discard the BOM, but then I noticed this (in ?file
):
As from R 3.0.0 the encoding "UTF-8-BOM" is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).
which means that if you have a sufficiently new R interpreter,
read.csv("my_file.txt", fileEncoding="UTF-8-BOM", ...other args...)
should do what you want.
most of the arguments in read.csv
are dummy args -- including fileEncoding
.
use read.table
instead
read.table("my_file.txt", header=TRUE, sep="\t", fileEncoding="UTF-8")
I had the same issue loading a csv file using either read.csv
(with encoding="UTF-87-BOM"
), read.table
or read_csv
from the readr
package. None of these attempt proved successful.
I could definitely not work with the BOM tag because upon sub setting my data (using both approaches subset()
or df[df$var=="value",]
), the first row was not taken into account.
I finally found a workaround that made the BOM tag vanish. Using the read.csv
function, I just defined a string vector for my column names in the argument col.names = ...
. This works like a charm and I can subset my data without issues.
I use R Version 3.5.0