How to convert a string with Unicode encoding to a string of letters
The Apache Commons Lang StringEscapeUtils.unescapeJava() can decode it properly.
import org.apache.commons.lang.StringEscapeUtils;@Testpublic void testUnescapeJava() { String sJava="\\u0048\\u0065\\u006C\\u006C\\u006F"; System.out.println("StringEscapeUtils.unescapeJava(sJava):\n" + StringEscapeUtils.unescapeJava(sJava));} output: StringEscapeUtils.unescapeJava(sJava): Hello
Technically doing:
String myString = "\u0048\u0065\u006C\u006C\u006F World";
automatically converts it to "Hello World"
, so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX
and just get XXXX
) then do Integer.ParseInt(XXXX, 16)
to get a hex value and then case that to char
to get the actual character.
Edit: Some code to accomplish this:
String str = myString.split(" ")[0];str = str.replace("\\","");String[] arr = str.split("u");String text = "";for(int i = 1; i < arr.length; i++){ int hexVal = Integer.parseInt(arr[i], 16); text += (char)hexVal;}// Text will now have Hello
You can use StringEscapeUtils
from Apache Commons Lang, i.e.:
String Title = StringEscapeUtils.unescapeJava("\\u0048\\u0065\\u006C\\u006C\\u006F");