remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java

java xml encoding utf-8

1) I get xml as java String with £ in it (I don't have access to interface right now, but I probably get xml as a java String). Can I use replaceAll(£, "") to get rid of this character?

I am assuming that you rather mean that you want to get rid of non-ASCII characters, because you're talking about a "legacy" side. You can get rid of anything outside the printable ASCII range using the following regex:

string = string.replaceAll("[^\\x20-\\x7e]", "");

2) I get xml as an array of bytes - how to handle this operation safely in that case?

You need to wrap the byte[] in an ByteArrayInputStream, so that you can read them in an UTF-8 encoded character stream using InputStreamReader wherein you specify the encoding and then use a BufferedReader to read it line by line.

E.g.

BufferedReader reader = null;try {    reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(bytes), "UTF-8"));    for (String line; (line = reader.readLine()) != null;) {        line = line.replaceAll("[^\\x20-\\x7e]", "");        // ...    }    // ...

java xml encoding utf-8

UTF-8 is an encoding; Unicode is a character set. But the GBP symbol is most definitely in the Unicode character set and therefore most certainly representable in UTF-8.

If you do in fact mean UTF-8, and you are actually trying to remove byte sequences that are not the valid encoding of a character in UTF-8, then...

CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);ByteBuffer bytes = ...;CharBuffer parsed = utf8Decoder.decode(bytes);...

java xml encoding utf-8

"test text".replaceAll("[^\\u0000-\\uFFFF]", "");

This code removes all 4-byte utf8 chars from string.This can be needed for some purposes while doing Mysql innodb varchar entry

CodeHunter

remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last