Why is R reading UTF-8 header as text?

r csv utf-8 byte-order-mark file-encodings

So I was going to give you instructions on how to manually open the file and check for and discard the BOM, but then I noticed this (in ?file):

As from R 3.0.0 the encoding "UTF-8-BOM" is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).

which means that if you have a sufficiently new R interpreter,

read.csv("my_file.txt", fileEncoding="UTF-8-BOM", ...other args...)

should do what you want.

r csv utf-8 byte-order-mark file-encodings

most of the arguments in read.csv are dummy args -- including fileEncoding.

use read.table instead

 read.table("my_file.txt", header=TRUE, sep="\t", fileEncoding="UTF-8")

r csv utf-8 byte-order-mark file-encodings

I had the same issue loading a csv file using either read.csv (with encoding="UTF-87-BOM"), read.table or read_csv from the readr package. None of these attempt proved successful.

I could definitely not work with the BOM tag because upon sub setting my data (using both approaches subset() or df[df$var=="value",]), the first row was not taken into account.

I finally found a workaround that made the BOM tag vanish. Using the read.csv function, I just defined a string vector for my column names in the argument col.names = ... . This works like a charm and I can subset my data without issues.

I use R Version 3.5.0

CodeHunter

Why is R reading UTF-8 header as text?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last