Reading UTF-8 - BOM marker

java file encoding

In Java, you have to consume manually the UTF8 BOM if present. This behaviour is documented in the Java bug database, here and here. There will be no fix for now because it will break existing tools like JavaDoc or XML parsers. The Apache IO Commons provides a BOMInputStream to handle this situation.

Take a look at this solution: Handle UTF8 file with BOM

java file encoding

The easiest fix is probably just to remove the resulting \uFEFF from the string, since it is extremely unlikely to appear for any other reason.

tmp = tmp.replace("\uFEFF", "");

Also see this Guava bug report

java file encoding

Use the Apache Commons library.

Class: org.apache.commons.io.input.BOMInputStream

Example usage:

String defaultEncoding = "UTF-8";InputStream inputStream = new FileInputStream(someFileWithPossibleUtf8Bom);try {    BOMInputStream bOMInputStream = new BOMInputStream(inputStream);    ByteOrderMark bom = bOMInputStream.getBOM();    String charsetName = bom == null ? defaultEncoding : bom.getCharsetName();    InputStreamReader reader = new InputStreamReader(new BufferedInputStream(bOMInputStream), charsetName);    //use reader} finally {    inputStream.close();}

CodeHunter

Reading UTF-8 - BOM marker

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last