How to convert a string with Unicode encoding to a string of letters How to convert a string with Unicode encoding to a string of letters java java

How to convert a string with Unicode encoding to a string of letters


The Apache Commons Lang StringEscapeUtils.unescapeJava() can decode it properly.

import org.apache.commons.lang.StringEscapeUtils;@Testpublic void testUnescapeJava() {    String sJava="\\u0048\\u0065\\u006C\\u006C\\u006F";    System.out.println("StringEscapeUtils.unescapeJava(sJava):\n" + StringEscapeUtils.unescapeJava(sJava));} output: StringEscapeUtils.unescapeJava(sJava): Hello


Technically doing:

String myString = "\u0048\u0065\u006C\u006C\u006F World";

automatically converts it to "Hello World", so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX and just get XXXX) then do Integer.ParseInt(XXXX, 16) to get a hex value and then case that to char to get the actual character.

Edit: Some code to accomplish this:

String str = myString.split(" ")[0];str = str.replace("\\","");String[] arr = str.split("u");String text = "";for(int i = 1; i < arr.length; i++){    int hexVal = Integer.parseInt(arr[i], 16);    text += (char)hexVal;}// Text will now have Hello


You can use StringEscapeUtils from Apache Commons Lang, i.e.:

String Title = StringEscapeUtils.unescapeJava("\\u0048\\u0065\\u006C\\u006C\\u006F");