How to convert \uXXXX unicode to UTF-8 using console tools in *nix
Might be a bit ugly, but echo -e
should do it:
echo -en "$(curl $URL)"
-e
interprets escapes, -n
suppresses the newline echo
would normally add.
Note: The \u
escape works in the bash builtin echo
, but not /usr/bin/echo
.
As pointed out in the comments, this is bash 4.2+, and 4.2.x have a bug handling 0x00ff/17 values (0x80-0xff).
I don't know which distribution you are using, but uni2ascii should be included.
$ sudo apt-get install uni2ascii
It only depend on libc6, so it's a lightweight solution (uni2ascii i386 4.18-2 is 55,0 kB on Ubuntu)!
Then to use it:
$ echo 'Character 1: \u0144, Character 2: \u00f3' | ascii2uni -a U -qCharacter 1: ń, Character 2: ó
I found native2ascii from JDK as the best way to do it:
native2ascii -encoding UTF-8 -reverse src.txt dest.txt
Detailed description is here: http://docs.oracle.com/javase/1.5.0/docs/tooldocs/windows/native2ascii.html
Update:No longer available since JDK9: https://bugs.openjdk.java.net/browse/JDK-8074431