Unicode character in PHP string Unicode character in PHP string php php

Unicode character in PHP string


PHP 7.0.0 has introduced the "Unicode codepoint escape" syntax.

It's now possible to write Unicode characters easily by using a double-quoted or a heredoc string, without calling any function.

$unicodeChar = "\u{1000}";


Because JSON directly supports the \uxxxx syntax the first thing that comes into my mind is:

$unicodeChar = '\u1000';echo json_decode('"'.$unicodeChar.'"');

Another option would be to use mb_convert_encoding()

echo mb_convert_encoding('က', 'UTF-8', 'HTML-ENTITIES');

or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:

echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');


I wonder why no one has mentioned this yet, but you can do an almost equivalent version using escape sequences in double quoted strings:

\x[0-9A-Fa-f]{1,2}

The sequence of characters matching the regular expression is a character in hexadecimal notation.

ASCII example:

<?php    echo("\x48\x65\x6C\x6C\x6F\x20\x57\x6F\x72\x6C\x64\x21");?>

Hello World!

So for your case, all you need to do is $str = "\x30\xA2";. But these are bytes, not characters. The byte representation of the Unicode codepoint coincides with UTF-16 big endian, so we could print it out directly as such:

<?php    header('content-type:text/html;charset=utf-16be');    echo("\x30\xA2");?>

If you are using a different encoding, you'll need alter the bytes accordingly (mostly done with a library, though possible by hand too).

UTF-16 little endian example:

<?php    header('content-type:text/html;charset=utf-16le');    echo("\xA2\x30");?>

UTF-8 example:

<?php    header('content-type:text/html;charset=utf-8');    echo("\xE3\x82\xA2");?>

There is also the pack function, but you can expect it to be slow.