Why does emoji have two different utf-8 codes? How to convert emoji from utf-8 , use NSString in ios?
0xF0, 0x9F, 0x98, 0x81
Is the correct UTF-8 encoding for U+1F601 😁.
0xED, 0xA0, 0xBD, 0xED, 0xB8, 0x81
Is not a valid UTF-8 sequence(*). It should really be rejected; iOS is correct to do so.
This is a bug in the bianma tool: the convertUtf8BytesToUnicodeCodePoints
function is more lenient about what input it accepts than the specified algorithm in eg RFC 3629.
This happens to return a working string only because the tool is written in JavaScript. Having decoded the above byte sequence to the bogus surrogate code point sequence U+D83D,U+DE01 it then converts that into a JavaScript string using a direct code-point-to-code-unit mapping giving \uD83D\xDE01
. As this is the correct way to encode 😁 in a UTF-16 string it appears to have worked.
(*: It is a valid CESU-8 sequence, but that encoding is just “bogus broken encoding for compatibility with badly-written historical tools” and should generally be avoided.)
You should not usually encounter a sequence like this; it is typically not worth catering for unless you have a specific source of this kind of malformed data which you don't have the power to get fixed.