R - c() unexpectedly converts names of named vectors into UTF-8. Is this a bug?
You should still see names(c(x)) == names(x)
on your system. The encoding change by c()
may be unintentional, but shouldn't affect your code in most scenarios.
On Windows, which doesn't have a UTF-8 locale, your safest bet is to convert all strings to UTF-8 first via enc2utf8()
, and then stay in UTF-8. This will also enable safe lookups.
Language symbols (as used in dplyr's group_by()
) are an entirely different issue. For some reason they are always interpreted in the native encoding. (Try as.name(names(c(x)))
.) However, it's still best to have them in UTF-8, and convert to native just before calling as.name()
. This is what dplyr should be doing, we're just not quite there yet.
My recommendation is to use ASCII-only characters for column names when using dplyr on Windows. This requires some discipline if you're relying on tidyr::spread()
for non-ASCII column contents. You could also consider switching to a system (OS X or Linux) that works with UTF-8 natively.