file.encoding has no effect, LC_ALL environment variable does it
Note: So finally I think that I have nailed it down. I am not confirming that it is right. But with some code reading and tests this is what I found out and I don't have additional time to look into it. If anyone is interested they can check it out and tell if this answer is right or wrong - I would be glad :)
The reference I used was from this tarball available at OpenJDK's site: openjdk-6-src-b25-01_may_2012.tar.gz
Java natively translates all string to platform's local encoding in this method:
jdk/src/share/native/common/jni_util.c - JNU_GetStringPlatformChars()
. System propertysun.jnu.encoding
is used to determine the platform's encoding.The value of
sun.jnu.encoding
is set atjdk/src/solaris/native/java/lang/java_props_md.c - GetJavaProperties()
usingsetlocale()
method of libc. Environment variableLC_ALL
is used to set the value ofsun.jnu.encoding
. Value given at the command prompt using-Dsun.jnu.encoding
option to Java is ignored.Call to
File.exists()
has been coded in filejdk/src/share/classes/java/io/File.java
and it returns asreturn ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);
getBooleanAttributes()
is natively coded (and I am skipping steps in code browsing through many files) injdk/src/share/native/java/io/UnixFileSystem_md.c
in function :Java_java_io_UnixFileSystem_getBooleanAttributes0()
. Here the macroWITH_FIELD_PLATFORM_STRING(env, file, ids.path, path)
converts path string to platform's encoding.So conversion to wrong encoding will actually send a wrong C string (char array) to subsequent call to
stat()
method. And it will return with result that file cannot be found.
LESSON: LC_ALL
is very important
I'm not sure where you read about file.encoding
. I don't see it mentioned with the other standard properties as documented with System.getProperties
. But judging from my experiments, it seems that this value influences the encoding of file content, not file names. System.out
in particular will not print non-ASCII characters if file.encoding
is POSIX
.
On the other hand, the Linux way to decide which encoding applies to file names is the LC_CTYPE
facet of the current locale setting. I see no reason why Java should override this. As many other platforms (Windows in particular) always use Unicode for file names, not bytes, there is little point in exposing the byte-level details of the file system to a Java application.
Please see bug 4163515 at java.com. It explains that:
- file.encoding is specific to Sun (now Oracle) implementation of JVM - others may not support it
- Shall be considered read-only
- To change it you shall modify environment in which the JVM runs (which is what you did with LC_ALL)
Also note that even if changing file.encoding "works" for your platform, you shall not do that - as it does not change default encoding used by Oracle JVM in general, but only in some subsystems. As the bug shows default encoding used by String constructors taking byte arrays are unaffected by this setting.