Unicode character in PHP string
PHP 7.0.0 has introduced the "Unicode codepoint escape" syntax.
It's now possible to write Unicode characters easily by using a double-quoted or a heredoc string, without calling any function.
$unicodeChar = "\u{1000}";
Because JSON directly supports the \uxxxx
syntax the first thing that comes into my mind is:
$unicodeChar = '\u1000';echo json_decode('"'.$unicodeChar.'"');
Another option would be to use mb_convert_encoding()
echo mb_convert_encoding('က', 'UTF-8', 'HTML-ENTITIES');
or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:
echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');
I wonder why no one has mentioned this yet, but you can do an almost equivalent version using escape sequences in double quoted strings:
\x[0-9A-Fa-f]{1,2}
The sequence of characters matching the regular expression is a character in hexadecimal notation.
ASCII example:
<?php echo("\x48\x65\x6C\x6C\x6F\x20\x57\x6F\x72\x6C\x64\x21");?>
Hello World!
So for your case, all you need to do is $str = "\x30\xA2";
. But these are bytes, not characters. The byte representation of the Unicode codepoint coincides with UTF-16 big endian, so we could print it out directly as such:
<?php header('content-type:text/html;charset=utf-16be'); echo("\x30\xA2");?>
ア
If you are using a different encoding, you'll need alter the bytes accordingly (mostly done with a library, though possible by hand too).
UTF-16 little endian example:
<?php header('content-type:text/html;charset=utf-16le'); echo("\xA2\x30");?>
ア
UTF-8 example:
<?php header('content-type:text/html;charset=utf-8'); echo("\xE3\x82\xA2");?>
ア
There is also the pack
function, but you can expect it to be slow.