htmlentites not working for emoji htmlentites not working for emoji php php

htmlentites not working for emoji


This works for regular HTML entities, UTF-8 emoticons (and other utf stuff) as well as regular strings of course.

I was just having trouble with empty string value, so I had to put this condition into the function.

function entities( $string ) {    $stringBuilder = "";    $offset = 0;    if ( empty( $string ) ) {        return "";    }    while ( $offset >= 0 ) {        $decValue = ordutf8( $string, $offset );        $char = unichr($decValue);        $htmlEntited = htmlentities( $char );        if( $char != $htmlEntited ){            $stringBuilder .= $htmlEntited;        } elseif( $decValue >= 128 ){            $stringBuilder .= "&#" . $decValue . ";";        } else {            $stringBuilder .= $char;        }    }    return $stringBuilder;}// source - http://php.net/manual/en/function.ord.php#109812function ordutf8($string, &$offset) {    $code = ord(substr($string, $offset,1));    if ($code >= 128) {        //otherwise 0xxxxxxx        if ($code < 224) $bytesnumber = 2;                //110xxxxx        else if ($code < 240) $bytesnumber = 3;        //1110xxxx        else if ($code < 248) $bytesnumber = 4;    //11110xxx        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);        for ($i = 2; $i <= $bytesnumber; $i++) {            $offset ++;            $code2 = ord(substr($string, $offset, 1)) - 128;        //10xxxxxx            $codetemp = $codetemp*64 + $code2;        }        $code = $codetemp;    }    $offset += 1;    if ($offset >= strlen($string)) $offset = -1;    return $code;}// source - http://php.net/manual/en/function.chr.php#88611function unichr($u) {    return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');}/* ---- */var_dump( entities( "&" ) ) . "\n";var_dump( entities( "<" ) ) . "\n";var_dump( entities( "😎" ) ) . "\n";var_dump( entities( "☚" ) ) . "\n";var_dump( entities( "" ) ) . "\n";var_dump( entities( "A" ) ) . "\n";var_dump( entities( "Hello 😎 world" ) ) . "\n";var_dump( entities( "this & that 😎" ) ) . "\n";


$emoji = "\xF0\x9F\x98\x8E"; // its your emoji

I get this callback from convert unicode to html entities hex

$hex = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {    $char = current($m);    $utf = iconv('UTF-8', 'UCS-4', $char);    return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));}, $emoji);echo $hex;

echo json_encode(("\xF0\x9F\x98\x8E")); // its decoded. htmlentities doesn't work with it.

Is this OK ?


htmlentities documentation states that

all characters which have HTML character entity equivalents are translated into these entities.

Your emoji does not have an equivalent like < is for <, so it doesn't get converted. &#128526; is just an HTML code, not an HTML entity.

function htmlEntitiesOrCode($string) {    //try htmlentities first    $result = htmlentities($string, ENT_COMPAT, "UTF-8");    //if the output is different from input, an entity was returned    if ($result != $string) {        return $result;    }    //get the html code    $offset = 0;    $code = ord(substr($string, $offset,1));    if ($code >= 128) {        if ($code < 224) {            $bytesnumber = 2;        } else if ($code < 240) {            $bytesnumber = 3;        } else if ($code < 248) {            $bytesnumber = 4;        }        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);        for ($i = 2; $i <= $bytesnumber; $i++) {            $offset ++;            $code2 = ord(substr($string, $offset, 1)) - 128;            $codetemp = $codetemp*64 + $code2;        }        $code = $codetemp;    }    $offset += 1;    if ($offset >= strlen($string)) {        $offset = -1;    }    $result = "&#" . $code;    return $result;}

HTML code function taken from here: http://php.net/manual/en/function.ord.php#109812