R - c() unexpectedly converts names of named vectors into UTF-8. Is this a bug?

r character-encoding

You should still see names(c(x)) == names(x) on your system. The encoding change by c() may be unintentional, but shouldn't affect your code in most scenarios.

On Windows, which doesn't have a UTF-8 locale, your safest bet is to convert all strings to UTF-8 first via enc2utf8(), and then stay in UTF-8. This will also enable safe lookups.

Language symbols (as used in dplyr's group_by()) are an entirely different issue. For some reason they are always interpreted in the native encoding. (Try as.name(names(c(x))).) However, it's still best to have them in UTF-8, and convert to native just before calling as.name(). This is what dplyr should be doing, we're just not quite there yet.

My recommendation is to use ASCII-only characters for column names when using dplyr on Windows. This requires some discipline if you're relying on tidyr::spread() for non-ASCII column contents. You could also consider switching to a system (OS X or Linux) that works with UTF-8 natively.

CodeHunter

R - c() unexpectedly converts names of named vectors into UTF-8. Is this a bug?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last