How to convert Emojis to their respective HTML code entities in PHP 5.3? How to convert Emojis to their respective HTML code entities in PHP 5.3? php php

How to convert Emojis to their respective HTML code entities in PHP 5.3?


The code to detect the emoji bypasses stackoverflow's character limit, so here's a gist instead:

https://gist.github.com/BarryMode/432a7a1f9621e824c8a3a23084a50f60#file-htmlemoji-php

The entire function is essentially just

preg_replace_callback(pattern, callback, string);

The string is the input where you have emoji that you want to change into html entities. The pattern uses regex to find the emoji in the string and then each one is fed into the callback, which is where the conversion happens from emoji to html entity.

In creating this function, htmlemoji(), I combined a few different pieces of code that others had worked on. Here's some credits:

The callback uses this stackoverflow answer to build each entity.

The pattern was directly ripped from this source on GitHub.


I have created a trait for this Which is a mix of the two ideas bellow, it covers missing ones like. 🤩

How to convert Emojis to their respective HTML code entities in PHP 5.3

Idea taken from https://gist.github.com/BarryMode/432a7a1f9621e824c8a3a23084a50f60#file-htmlemoji-php andhttps://github.com/chefkoch-dev/morphoji

A mix of the 2 ideas above.

trait ConvertEmojis {

/** @var string */protected static $emojiPattern;public function convert($str) {    return preg_replace_callback($this->getEmojiPattern(), array(&$this, 'entity'), $str);}protected function entity($matches) {    return '&#'.hexdec(bin2hex(mb_convert_encoding("$matches[0]", 'UTF-32', 'UTF-8'))).';';}/** * Returns a regular expression pattern to detect emoji characters. * * @return string */protected function getEmojiPattern(){    if (null === self::$emojiPattern) {        $codeString = '';        foreach ($this->getEmojiCodeList() as $code) {            if (is_array($code)) {                $first = dechex(array_shift($code));                $last  = dechex(array_pop($code));                $codeString .= '\x{' . $first . '}-\x{' . $last . '}';            } else {                $codeString .= '\x{' . dechex($code) . '}';            }        }        self::$emojiPattern = "/[$codeString]/u";    }    return self::$emojiPattern;}/** * Returns an array with all unicode values for emoji characters. * * This is a function so the array can be defined with a mix of hex values * and range() calls to conveniently maintain the array with information * from the official Unicode tables (where values are given in hex as well). * * With PHP > 5.6 this could be done in class variable, maybe even a * constant. * * @return array */protected function getEmojiCodeList(){    return [        // Various 'older' charactes, dingbats etc. which over time have        // received an additional emoji representation.        0x203c,        0x2049,        0x2122,        0x2139,        range(0x2194, 0x2199),        range(0x21a9, 0x21aa),        range(0x231a, 0x231b),        0x2328,        range(0x23ce, 0x23cf),        range(0x23e9, 0x23f3),        range(0x23f8, 0x23fa),        0x24c2,        range(0x25aa, 0x25ab),        0x25b6,        0x25c0,        range(0x25fb, 0x25fe),        range(0x2600, 0x2604),        0x260e,        0x2611,        range(0x2614, 0x2615),        0x2618,        0x261d,        0x2620,        range(0x2622, 0x2623),        0x2626,        0x262a,        range(0x262e, 0x262f),        range(0x2638, 0x263a),        0x2640,        0x2642,        range(0x2648, 0x2653),        0x2660,        0x2663,        range(0x2665, 0x2666),        0x2668,        0x267b,        0x267f,        range(0x2692, 0x2697),        0x2699,        range(0x269b, 0x269c),        range(0x26a0, 0x26a1),        range(0x26aa, 0x26ab),        range(0x26b0, 0x26b1),        range(0x26bd, 0x26be),        range(0x26c4, 0x26c5),        0x26c8,        range(0x26ce, 0x26cf),        0x26d1,        range(0x26d3, 0x26d4),        range(0x26e9, 0x26ea),        range(0x26f0, 0x26f5),        range(0x26f7, 0x26fa),        0x26fd,        0x2702,        0x2705,        range(0x2708, 0x270d),        0x270f,        0x2712,        0x2714,        0x2716,        0x271d,        0x2721,        0x2728,        range(0x2733, 0x2734),        0x2744,        0x2747,        0x274c,        0x274e,        range(0x2753, 0x2755),        0x2757,        range(0x2763, 0x2764),        range(0x2795, 0x2797),        0x27a1,        0x27b0,        0x27bf,        range(0x2934, 0x2935),        range(0x2b05, 0x2b07),        range(0x2b1b, 0x2b1c),        0x2b50,        0x2b55,        0x3030,        0x303d,        0x3297,        0x3299,        // Modifier for emoji sequences.        0x200d,        0x20e3,        0xfe0f,        // 'Regular' emoji unicode space, containing the bulk of them.        range(0x1f000, 0x1f9cf)    ];}    

}