Find out if Character in String is emoji?
What I stumbled upon is the difference between characters, unicode scalars and glyphs.
For example, the glyph ๐จโ๐จโ๐งโ๐ง consists of 7 unicode scalars:
- Four emoji characters: ๐จ๐ฉ๐ง๐ง
- In between each emoji is a special character, which works like character glue; see the specs for more info
Another example, the glyph ๐๐ฟ consists of 2 unicode scalars:
- The regular emoji: ๐
- A skin tone modifier: ๐ฟ
Last one, the glyph 1๏ธโฃ contains three unicode characters:
So when rendering the characters, the resulting glyphs really matter.
Swift 5.0 and above makes this process much easier and gets rid of some guesswork we needed to do. Unicode.Scalar
's new Property
type helps is determine what we're dealing with. However, those properties only make sense when checking the other scalars within the glyph. This is why we'll be adding some convenience methods to the Character class to help us out.
For more detail, I wrote an article explaining how this works.
For Swift 5.0, it leaves you with the following result:
extension Character { /// A simple emoji is one scalar and presented to the user as an Emoji var isSimpleEmoji: Bool { guard let firstScalar = unicodeScalars.first else { return false } return firstScalar.properties.isEmoji && firstScalar.value > 0x238C } /// Checks if the scalars will be merged into an emoji var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false } var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }}extension String { var isSingleEmoji: Bool { count == 1 && containsEmoji } var containsEmoji: Bool { contains { $0.isEmoji } } var containsOnlyEmoji: Bool { !isEmpty && !contains { !$0.isEmoji } } var emojiString: String { emojis.map { String($0) }.reduce("", +) } var emojis: [Character] { filter { $0.isEmoji } } var emojiScalars: [UnicodeScalar] { filter { $0.isEmoji }.flatMap { $0.unicodeScalars } }}
Which will give you the following results:
"Aฬอฬ".containsEmoji // false"3".containsEmoji // false"Aฬอฬโถ๏ธ".unicodeScalars // [65, 795, 858, 790, 9654, 65039]"Aฬอฬโถ๏ธ".emojiScalars // [9654, 65039]"3๏ธโฃ".isSingleEmoji // true"3๏ธโฃ".emojiScalars // [51, 65039, 8419]"๐๐ฟ".isSingleEmoji // true"๐๐ผโโ๏ธ".isSingleEmoji // true"๐น๐ฉ".isSingleEmoji // true"โฐ".isSingleEmoji // true"๐ถ".isSingleEmoji // true"๐จโ๐ฉโ๐งโ๐ง".isSingleEmoji // true"๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ".isSingleEmoji // true"๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ".containsOnlyEmoji // true"๐จโ๐ฉโ๐งโ๐ง".containsOnlyEmoji // true"Hello ๐จโ๐ฉโ๐งโ๐ง".containsOnlyEmoji // false"Hello ๐จโ๐ฉโ๐งโ๐ง".containsEmoji // true"๐ซ Hรฉllo ๐จโ๐ฉโ๐งโ๐ง".emojiString // "๐ซ๐จโ๐ฉโ๐งโ๐ง""๐จโ๐ฉโ๐งโ๐ง".count // 1"๐ซ Hรฉllล ๐จโ๐ฉโ๐งโ๐ง".emojiScalars // [128107, 128104, 8205, 128105, 8205, 128103, 8205, 128103]"๐ซ Hรฉllล ๐จโ๐ฉโ๐งโ๐ง".emojis // ["๐ซ", "๐จโ๐ฉโ๐งโ๐ง"]"๐ซ Hรฉllล ๐จโ๐ฉโ๐งโ๐ง".emojis.count // 2"๐ซ๐จโ๐ฉโ๐งโ๐ง๐จโ๐จโ๐ฆ".isSingleEmoji // false"๐ซ๐จโ๐ฉโ๐งโ๐ง๐จโ๐จโ๐ฆ".containsOnlyEmoji // true
For older Swift versions, check out this gist containing my old code.
The simplest, cleanest, and swiftiest way to accomplish this is to simply check the Unicode code points for each character in the string against known emoji and dingbats ranges, like so:
extension String { var containsEmoji: Bool { for scalar in unicodeScalars { switch scalar.value { case 0x1F600...0x1F64F, // Emoticons 0x1F300...0x1F5FF, // Misc Symbols and Pictographs 0x1F680...0x1F6FF, // Transport and Map 0x2600...0x26FF, // Misc symbols 0x2700...0x27BF, // Dingbats 0xFE00...0xFE0F, // Variation Selectors 0x1F900...0x1F9FF, // Supplemental Symbols and Pictographs 0x1F1E6...0x1F1FF: // Flags return true default: continue } } return false }}
Swift 5.0
โฆ introduced a new way of checking exactly this!
You have to break your String
into its Scalars
. Each Scalar
has a Property
value which supports the isEmoji
value!
Actually you can even check if the Scalar is a Emoji modifier or more. Check out Apple's documentation: https://developer.apple.com/documentation/swift/unicode/scalar/properties
You may want to consider checking for isEmojiPresentation
instead of isEmoji
, because Apple states the following for isEmoji
:
This property is true for scalars that are rendered as emoji by default and also for scalars that have a non-default emoji rendering when followed by U+FE0F VARIATION SELECTOR-16. This includes some scalars that are not typically considered to be emoji.
This way actually splits up Emoji's into all the modifiers, but it is way simpler to handle. And as Swift now counts Emoji's with modifiers (e.g.: ๐จโ๐ฉโ๐งโ๐ฆ, ๐จ๐ปโ๐ป, ๐ด) as 1 you can do all kind of stuff.
var string = "๐ค test"for scalar in string.unicodeScalars { let isEmoji = scalar.properties.isEmoji print("\(scalar.description) \(isEmoji)"))}// ๐ค true// false// t false// e false// s false// t false
NSHipster points out an interesting way to get all Emoji's:
import Foundationvar emoji = CharacterSet()for codePoint in 0x0000...0x1F0000 { guard let scalarValue = Unicode.Scalar(codePoint) else { continue } // Implemented in Swift 5 (SE-0221) // https://github.com/apple/swift-evolution/blob/master/proposals/0221-character-properties.md if scalarValue.properties.isEmoji { emoji.insert(scalarValue) }}