file.encoding has no effect, LC_ALL environment variable does it file.encoding has no effect, LC_ALL environment variable does it linux linux

file.encoding has no effect, LC_ALL environment variable does it


Note: So finally I think that I have nailed it down. I am not confirming that it is right. But with some code reading and tests this is what I found out and I don't have additional time to look into it. If anyone is interested they can check it out and tell if this answer is right or wrong - I would be glad :)

The reference I used was from this tarball available at OpenJDK's site: openjdk-6-src-b25-01_may_2012.tar.gz

  1. Java natively translates all string to platform's local encoding in this method: jdk/src/share/native/common/jni_util.c - JNU_GetStringPlatformChars() . System property sun.jnu.encoding is used to determine the platform's encoding.

  2. The value of sun.jnu.encoding is set at jdk/src/solaris/native/java/lang/java_props_md.c - GetJavaProperties() using setlocale() method of libc. Environment variable LC_ALL is used to set the value of sun.jnu.encoding. Value given at the command prompt using -Dsun.jnu.encoding option to Java is ignored.

  3. Call to File.exists() has been coded in file jdk/src/share/classes/java/io/File.java and it returns as

    return ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);

  4. getBooleanAttributes() is natively coded (and I am skipping steps in code browsing through many files) in jdk/src/share/native/java/io/UnixFileSystem_md.c in function :Java_java_io_UnixFileSystem_getBooleanAttributes0(). Here the macro WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path) converts path string to platform's encoding.

  5. So conversion to wrong encoding will actually send a wrong C string (char array) to subsequent call to stat() method. And it will return with result that file cannot be found.

LESSON: LC_ALL is very important


I'm not sure where you read about file.encoding. I don't see it mentioned with the other standard properties as documented with System.getProperties. But judging from my experiments, it seems that this value influences the encoding of file content, not file names. System.out in particular will not print non-ASCII characters if file.encoding is POSIX.

On the other hand, the Linux way to decide which encoding applies to file names is the LC_CTYPE facet of the current locale setting. I see no reason why Java should override this. As many other platforms (Windows in particular) always use Unicode for file names, not bytes, there is little point in exposing the byte-level details of the file system to a Java application.


Please see bug 4163515 at java.com. It explains that:

  1. file.encoding is specific to Sun (now Oracle) implementation of JVM - others may not support it
  2. Shall be considered read-only
  3. To change it you shall modify environment in which the JVM runs (which is what you did with LC_ALL)

Also note that even if changing file.encoding "works" for your platform, you shall not do that - as it does not change default encoding used by Oracle JVM in general, but only in some subsystems. As the bug shows default encoding used by String constructors taking byte arrays are unaffected by this setting.