Working with Unicode code points in Swift Working with Unicode code points in Swift ios ios

Working with Unicode code points in Swift


Updated for Swift 3

String and Character

For almost everyone in the future who visits this question, String and Character will be the answer for you.

Set Unicode values directly in code:

var str: String = "I want to visit 北京, Москва, मुंबई, القاهرة, and 서울시. 😊"var character: Character = "🌍"

Use hexadecimal to set values

var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大🍎πvar character: Character = "\u{65}\u{301}" // é = "e" + accent mark

Note that the Swift Character can be composed of multiple Unicode code points, but appears to be a single character. This is called an Extended Grapheme Cluster.

See this question also.

Convert to Unicode values:

str.utf8str.utf16str.unicodeScalars // UTF-32String(character).utf8String(character).utf16String(character).unicodeScalars

Convert from Unicode hex values:

let hexValue: UInt32 = 0x1F34E// convert hex value to UnicodeScalarguard let scalarValue = UnicodeScalar(hexValue) else {    // early exit if hex does not form a valid unicode value    return}// convert UnicodeScalar to Stringlet myString = String(scalarValue) // 🍎

Or alternatively:

let hexValue: UInt32 = 0x1F34Eif let scalarValue = UnicodeScalar(hexValue) {    let myString = String(scalarValue)}

A few more examples

let value0: UInt8 = 0x61let value1: UInt16 = 0x5927let value2: UInt32 = 0x1F34Elet string0 = String(UnicodeScalar(value0)) // alet string1 = String(UnicodeScalar(value1)) // 大let string2 = String(UnicodeScalar(value2)) // 🍎// convert hex array to Stringlet myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int arrayvar myString = ""for hexValue in myHexArray {    myString.append(UnicodeScalar(hexValue))}print(myString) // Cat‼🐱

Note that for UTF-8 and UTF-16 the conversion is not always this easy. (See UTF-8, UTF-16, and UTF-32 questions.)

NSString and unichar

It is also possible to work with NSString and unichar in Swift, but you should realize that unless you are familiar with Objective C and good at converting the syntax to Swift, it will be difficult to find good documentation.

Also, unichar is a UInt16 array and as mentioned above the conversion from UInt16 to Unicode scalar values is not always easy (i.e., converting surrogate pairs for things like emoji and other characters in the upper code planes).

Custom string structure

For the reasons mentioned in the question, I ended up not using any of the above methods. Instead I wrote my own string structure, which was basically an array of UInt32 to hold Unicode scalar values.

Again, this is not the solution for most people. First consider using extensions if you only need to extend the functionality of String or Character a little.

But if you really need to work exclusively with Unicode scalar values, you could write a custom struct.

The advantages are:

  • Don't need to constantly switch between Types (String, Character, UnicodeScalar, UInt32, etc.) when doing string manipulation.
  • After Unicode manipulation is finished, the final conversion to String is easy.
  • Easy to add more methods when they are needed
  • Simplifies converting code from Java or other languages

Disadavantages are:

  • makes code less portable and less readable for other Swift developers
  • not as well tested and optimized as the native Swift types
  • it is yet another file that has to be included in a project every time you need it

You can make your own, but here is mine for reference. The hardest part was making it Hashable.

// This struct is an array of UInt32 to hold Unicode scalar values// Version 3.4.0 (Swift 3 update)struct ScalarString: Sequence, Hashable, CustomStringConvertible {        fileprivate var scalarArray: [UInt32] = []            init() {        // does anything need to go here?    }        init(_ character: UInt32) {        self.scalarArray.append(character)    }        init(_ charArray: [UInt32]) {        for c in charArray {            self.scalarArray.append(c)        }    }        init(_ string: String) {                for s in string.unicodeScalars {            self.scalarArray.append(s.value)        }    }        // Generator in order to conform to SequenceType protocol    // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })    func makeIterator() -> AnyIterator<UInt32> {        return AnyIterator(scalarArray.makeIterator())    }        // append    mutating func append(_ scalar: UInt32) {        self.scalarArray.append(scalar)    }        mutating func append(_ scalarString: ScalarString) {        for scalar in scalarString {            self.scalarArray.append(scalar)        }    }        mutating func append(_ string: String) {        for s in string.unicodeScalars {            self.scalarArray.append(s.value)        }    }        // charAt    func charAt(_ index: Int) -> UInt32 {        return self.scalarArray[index]    }        // clear    mutating func clear() {        self.scalarArray.removeAll(keepingCapacity: true)    }        // contains    func contains(_ character: UInt32) -> Bool {        for scalar in self.scalarArray {            if scalar == character {                return true            }        }        return false    }        // description (to implement Printable protocol)    var description: String {        return self.toString()    }        // endsWith    func endsWith() -> UInt32? {        return self.scalarArray.last    }        // indexOf    // returns first index of scalar string match    func indexOf(_ string: ScalarString) -> Int? {                if scalarArray.count < string.length {            return nil        }                for i in 0...(scalarArray.count - string.length) {                        for j in 0..<string.length {                                if string.charAt(j) != scalarArray[i + j] {                    break // substring mismatch                }                if j == string.length - 1 {                    return i                }            }        }                return nil    }        // insert    mutating func insert(_ scalar: UInt32, atIndex index: Int) {        self.scalarArray.insert(scalar, at: index)    }    mutating func insert(_ string: ScalarString, atIndex index: Int) {        var newIndex = index        for scalar in string {            self.scalarArray.insert(scalar, at: newIndex)            newIndex += 1        }    }    mutating func insert(_ string: String, atIndex index: Int) {        var newIndex = index        for scalar in string.unicodeScalars {            self.scalarArray.insert(scalar.value, at: newIndex)            newIndex += 1        }    }        // isEmpty    var isEmpty: Bool {        return self.scalarArray.count == 0    }        // hashValue (to implement Hashable protocol)    var hashValue: Int {                // DJB Hash Function        return self.scalarArray.reduce(5381) {            ($0 << 5) &+ $0 &+ Int($1)        }    }        // length    var length: Int {        return self.scalarArray.count    }        // remove character    mutating func removeCharAt(_ index: Int) {        self.scalarArray.remove(at: index)    }    func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString {                var returnString = ScalarString()                for scalar in self.scalarArray {            if scalar != character {                returnString.append(scalar)            }        }                return returnString    }    func removeRange(_ range: CountableRange<Int>) -> ScalarString? {                if range.lowerBound < 0 || range.upperBound > scalarArray.count {            return nil        }                var returnString = ScalarString()                for i in 0..<scalarArray.count {            if i < range.lowerBound || i >= range.upperBound {                returnString.append(scalarArray[i])            }        }                return returnString    }            // replace    func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString {                var returnString = ScalarString()                for scalar in self.scalarArray {            if scalar == character {                returnString.append(replacementChar)            } else {                returnString.append(scalar)            }        }        return returnString    }    func replace(_ character: UInt32, withString replacementString: String) -> ScalarString {                var returnString = ScalarString()                for scalar in self.scalarArray {            if scalar == character {                returnString.append(replacementString)            } else {                returnString.append(scalar)            }        }        return returnString    }    func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString {                var returnString = ScalarString()                for i in 0..<scalarArray.count {            if i < range.lowerBound || i >= range.upperBound {                returnString.append(scalarArray[i])            } else if i == range.lowerBound {                returnString.append(replacementString)            }        }        return returnString    }        // set (an alternative to myScalarString = "some string")    mutating func set(_ string: String) {        self.scalarArray.removeAll(keepingCapacity: false)        for s in string.unicodeScalars {            self.scalarArray.append(s.value)        }    }        // split    func split(atChar splitChar: UInt32) -> [ScalarString] {        var partsArray: [ScalarString] = []        if self.scalarArray.count == 0 {            return partsArray        }        var part: ScalarString = ScalarString()        for scalar in self.scalarArray {            if scalar == splitChar {                partsArray.append(part)                part = ScalarString()            } else {                part.append(scalar)            }        }        partsArray.append(part)        return partsArray    }        // startsWith    func startsWith() -> UInt32? {        return self.scalarArray.first    }        // substring    func substring(_ startIndex: Int) -> ScalarString {        // from startIndex to end of string        var subArray: ScalarString = ScalarString()        for i in startIndex..<self.length {            subArray.append(self.scalarArray[i])        }        return subArray    }    func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString {        // (startIndex is inclusive, endIndex is exclusive)        var subArray: ScalarString = ScalarString()        for i in startIndex..<endIndex {            subArray.append(self.scalarArray[i])        }        return subArray    }        // toString    func toString() -> String {        var string: String = ""                for scalar in self.scalarArray {            if let validScalor = UnicodeScalar(scalar) {                string.append(Character(validScalor))            }        }        return string    }        // trim    // removes leading and trailing whitespace (space, tab, newline)    func trim() -> ScalarString {                //var returnString = ScalarString()        let space: UInt32 = 0x00000020        let tab: UInt32 = 0x00000009        let newline: UInt32 = 0x0000000A                var startIndex = self.scalarArray.count        var endIndex = 0                // leading whitespace        for i in 0..<self.scalarArray.count {            if self.scalarArray[i] != space &&                self.scalarArray[i] != tab &&                self.scalarArray[i] != newline {                                startIndex = i                break            }        }                // trailing whitespace        for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) {            if self.scalarArray[i] != space &&                self.scalarArray[i] != tab &&                self.scalarArray[i] != newline {                                endIndex = i + 1                break            }        }                if endIndex <= startIndex {            return ScalarString()        }                return self.substring(startIndex, endIndex)    }        // values    func values() -> [UInt32] {        return self.scalarArray    }    }func ==(left: ScalarString, right: ScalarString) -> Bool {    return left.scalarArray == right.scalarArray}func +(left: ScalarString, right: ScalarString) -> ScalarString {    var returnString = ScalarString()    for scalar in left.values() {        returnString.append(scalar)    }    for scalar in right.values() {        returnString.append(scalar)    }    return returnString}


//Swift 3.0  // This struct is an array of UInt32 to hold Unicode scalar valuesstruct ScalarString: Sequence, Hashable, CustomStringConvertible {    private var scalarArray: [UInt32] = []    init() {        // does anything need to go here?    }    init(_ character: UInt32) {        self.scalarArray.append(character)    }    init(_ charArray: [UInt32]) {        for c in charArray {            self.scalarArray.append(c)        }    }    init(_ string: String) {        for s in string.unicodeScalars {            self.scalarArray.append(s.value)        }    }    // Generator in order to conform to SequenceType protocol    // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })    //func generate() -> AnyIterator<UInt32> {    func makeIterator() -> AnyIterator<UInt32> {        let nextIndex = 0        return AnyIterator {            if (nextIndex > self.scalarArray.count-1) {                return nil            }            return self.scalarArray[nextIndex + 1]        }    }    // append    mutating func append(scalar: UInt32) {        self.scalarArray.append(scalar)    }    mutating func append(scalarString: ScalarString) {        for scalar in scalarString {            self.scalarArray.append(scalar)        }    }    mutating func append(string: String) {        for s in string.unicodeScalars {            self.scalarArray.append(s.value)        }    }    // charAt    func charAt(index: Int) -> UInt32 {        return self.scalarArray[index]    }    // clear    mutating func clear() {        self.scalarArray.removeAll(keepingCapacity: true)    }    // contains    func contains(character: UInt32) -> Bool {        for scalar in self.scalarArray {            if scalar == character {                return true            }        }        return false    }    // description (to implement Printable protocol)    var description: String {        var string: String = ""        for scalar in scalarArray {            string.append(String(describing: UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)        }        return string    }    // endsWith    func endsWith() -> UInt32? {        return self.scalarArray.last    }    // insert    mutating func insert(scalar: UInt32, atIndex index: Int) {        self.scalarArray.insert(scalar, at: index)    }    // isEmpty    var isEmpty: Bool {        get {            return self.scalarArray.count == 0        }    }    // hashValue (to implement Hashable protocol)    var hashValue: Int {        get {            // DJB Hash Function            var hash = 5381            for i in 0 ..< scalarArray.count {                hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])            }            /*             for i in 0..< self.scalarArray.count {             hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])             }             */            return hash        }    }    // length    var length: Int {        get {            return self.scalarArray.count        }    }    // remove character    mutating func removeCharAt(index: Int) {        self.scalarArray.remove(at: index)    }    func removingAllInstancesOfChar(character: UInt32) -> ScalarString {        var returnString = ScalarString()        for scalar in self.scalarArray {            if scalar != character {                returnString.append(scalar: scalar) //.append(scalar)            }        }        return returnString    }    // replace    func replace(character: UInt32, withChar replacementChar: UInt32) -> ScalarString {        var returnString = ScalarString()        for scalar in self.scalarArray {            if scalar == character {                returnString.append(scalar: replacementChar) //.append(replacementChar)            } else {                returnString.append(scalar: scalar) //.append(scalar)            }        }        return returnString    }    // func replace(character: UInt32, withString replacementString: String) -> ScalarString {    func replace(character: UInt32, withString replacementString: ScalarString) -> ScalarString {        var returnString = ScalarString()        for scalar in self.scalarArray {            if scalar == character {                returnString.append(scalarString: replacementString) //.append(replacementString)            } else {                returnString.append(scalar: scalar) //.append(scalar)            }        }        return returnString    }    // set (an alternative to myScalarString = "some string")    mutating func set(string: String) {        self.scalarArray.removeAll(keepingCapacity: false)        for s in string.unicodeScalars {            self.scalarArray.append(s.value)        }    }    // split    func split(atChar splitChar: UInt32) -> [ScalarString] {        var partsArray: [ScalarString] = []        var part: ScalarString = ScalarString()        for scalar in self.scalarArray {            if scalar == splitChar {                partsArray.append(part)                part = ScalarString()            } else {                part.append(scalar: scalar) //.append(scalar)            }        }        partsArray.append(part)        return partsArray    }    // startsWith    func startsWith() -> UInt32? {        return self.scalarArray.first    }    // substring    func substring(startIndex: Int) -> ScalarString {        // from startIndex to end of string        var subArray: ScalarString = ScalarString()        for i in startIndex ..< self.length {            subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])        }        return subArray    }    func substring(startIndex: Int, _ endIndex: Int) -> ScalarString {        // (startIndex is inclusive, endIndex is exclusive)        var subArray: ScalarString = ScalarString()        for i in startIndex ..< endIndex {            subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])        }        return subArray    }    // toString    func toString() -> String {        let string: String = ""        for scalar in self.scalarArray {            string.appending(String(describing:UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)        }        return string    }    // values    func values() -> [UInt32] {        return self.scalarArray    }}func ==(left: ScalarString, right: ScalarString) -> Bool {    if left.length != right.length {        return false    }    for i in 0 ..< left.length {        if left.charAt(index: i) != right.charAt(index: i) {            return false        }    }    return true}func +(left: ScalarString, right: ScalarString) -> ScalarString {    var returnString = ScalarString()    for scalar in left.values() {        returnString.append(scalar: scalar) //.append(scalar)    }    for scalar in right.values() {        returnString.append(scalar: scalar) //.append(scalar)    }    return returnString}