Default character encoding for java console output Default character encoding for java console output windows windows

Default character encoding for java console output


I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.)

Java will encode bytes using the default encoding set during JVM initialization.

Reproducing on my PC:

java Foo

Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake

java -Dfile.encoding=UTF-8 Foo

Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake

cat test.txt

cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850.

java Foo | cat

Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850

java -Dfile.encoding=UTF-8 Foo | cat

Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850

This implementation of cat must use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.)

This can be confirmed with the following commands:

$ java HexDump utf8.txt78 78 c3 a4 c3 b1 78 78$ cat utf8.txtxxäñxx$ java HexDump ansi.txt78 78 e4 f1 78 78$ cat ansi.txtxxäñxx

The cat command can make this determination because e4 f1 is not a valid UTF-8 sequence.

You can correct the Java output by:

HexDump is a trivial Java application:

import java.io.*;class HexDump {  public static void main(String[] args) throws IOException {    try (InputStream in = new FileInputStream(args[0])) {      int r;      while((r = in.read()) != -1) {        System.out.format("%02x ", 0xFF & r);      }      System.out.println();    }  }}