htmlentites not working for emoji
This works for regular HTML entities, UTF-8 emoticons (and other utf stuff) as well as regular strings of course.
I was just having trouble with empty string value, so I had to put this condition into the function.
function entities( $string ) { $stringBuilder = ""; $offset = 0; if ( empty( $string ) ) { return ""; } while ( $offset >= 0 ) { $decValue = ordutf8( $string, $offset ); $char = unichr($decValue); $htmlEntited = htmlentities( $char ); if( $char != $htmlEntited ){ $stringBuilder .= $htmlEntited; } elseif( $decValue >= 128 ){ $stringBuilder .= "&#" . $decValue . ";"; } else { $stringBuilder .= $char; } } return $stringBuilder;}// source - http://php.net/manual/en/function.ord.php#109812function ordutf8($string, &$offset) { $code = ord(substr($string, $offset,1)); if ($code >= 128) { //otherwise 0xxxxxxx if ($code < 224) $bytesnumber = 2; //110xxxxx else if ($code < 240) $bytesnumber = 3; //1110xxxx else if ($code < 248) $bytesnumber = 4; //11110xxx $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0); for ($i = 2; $i <= $bytesnumber; $i++) { $offset ++; $code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx $codetemp = $codetemp*64 + $code2; } $code = $codetemp; } $offset += 1; if ($offset >= strlen($string)) $offset = -1; return $code;}// source - http://php.net/manual/en/function.chr.php#88611function unichr($u) { return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');}/* ---- */var_dump( entities( "&" ) ) . "\n";var_dump( entities( "<" ) ) . "\n";var_dump( entities( "😎" ) ) . "\n";var_dump( entities( "☚" ) ) . "\n";var_dump( entities( "" ) ) . "\n";var_dump( entities( "A" ) ) . "\n";var_dump( entities( "Hello 😎 world" ) ) . "\n";var_dump( entities( "this & that 😎" ) ) . "\n";
$emoji = "\xF0\x9F\x98\x8E";
// its your emoji
I get this callback from convert unicode to html entities hex
$hex = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) { $char = current($m); $utf = iconv('UTF-8', 'UCS-4', $char); return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));}, $emoji);echo $hex;
echo json_encode(("\xF0\x9F\x98\x8E"));
// its decoded. htmlentities doesn't work with it.
Is this OK ?
htmlentities
documentation states that
all characters which have HTML character entity equivalents are translated into these entities.
Your emoji does not have an equivalent like <
is for <
, so it doesn't get converted. 😎
is just an HTML code, not an HTML entity.
function htmlEntitiesOrCode($string) { //try htmlentities first $result = htmlentities($string, ENT_COMPAT, "UTF-8"); //if the output is different from input, an entity was returned if ($result != $string) { return $result; } //get the html code $offset = 0; $code = ord(substr($string, $offset,1)); if ($code >= 128) { if ($code < 224) { $bytesnumber = 2; } else if ($code < 240) { $bytesnumber = 3; } else if ($code < 248) { $bytesnumber = 4; } $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0); for ($i = 2; $i <= $bytesnumber; $i++) { $offset ++; $code2 = ord(substr($string, $offset, 1)) - 128; $codetemp = $codetemp*64 + $code2; } $code = $codetemp; } $offset += 1; if ($offset >= strlen($string)) { $offset = -1; } $result = "&#" . $code; return $result;}
HTML code function taken from here: http://php.net/manual/en/function.ord.php#109812