Java Clipboard: Paste HTML from Firefox on Linux Java Clipboard: Paste HTML from Firefox on Linux linux linux

Java Clipboard: Paste HTML from Firefox on Linux


I belive the problem is related due to the fact that he read from clipboard as US-ASCII, then convert to unicode and expect to leave German umlauts intact. As US-ASCII is a 7-bit charset German umlauts are not included and already lost after reading the clipboard as US-ASCII.

public class CharsetDemo {    public static void main(String[] args) throws Exception {        byte[] bytes;        // convert the German umlaut to bytes in US-ASCII charset        bytes = "ö".getBytes("US-ASCII");        System.out.println("US-ASCII");        System.out.println("bytes : " + asHexString(bytes));        System.out.println("string: " + new String(bytes, "US-ASCII"));        System.out.println();        // create a unicode string from the US-ASCII bytes        String utf8String = new String(bytes, "UTF-8");        bytes = utf8String.getBytes("UTF-8");        System.out.println("UTF-8");        System.out.println("bytes : " + asHexString(bytes));        System.out.println("string: " + utf8String);        System.out.println();        // convert the German umlaut to bytes in ISO-8859-1 charset        bytes = "ö".getBytes("ISO-8859-1");        System.out.println("ISO 8859-1");        System.out.println("bytes : " + asHexString(bytes));        System.out.println("string: " + new String(bytes, "ISO-8859-1"));        System.out.println();        // create a unicode string from the ISO-8859-1 bytes        utf8String = new String(bytes, "UTF-8");        bytes = utf8String.getBytes("UTF-8");        System.out.println("UTF-8");        System.out.println("bytes : " + asHexString(bytes));        System.out.println("string: " + utf8String);        System.out.println();        // bytes of the "REPLACEMET CHARACTER"        System.out.println("replacement character bytes: "             + asHexString("\uFFFD".getBytes("UTF-8")));    }    static String asHexString(byte[] bytes) {        StringBuilder sb = new StringBuilder();        for (byte b : bytes) {            sb.append(String.format("%X ", b));        }        return sb.toString();    }}

output

US-ASCIIbytes : 3F string: ?  <--- the question mark represents here the "REPLACEMENT CHARACTER"UTF-8bytes : 3F string: ?ISO 8859-1bytes : F6 string: öUTF-8bytes : EF BF BD  <-- the "REPLACEMENT CHARACTER", as "F6" is not a valid UTF-8 codepointstring: �replacement character bytes: EF BF BD