Using Objective C/Cocoa to unescape unicode characters, ie \u1234

~~It's correct that Cocoa does not offer a solution~~, yet Core Foundation does: CFStringTransform.

CFStringTransform lives in a dusty, remote corner of Mac OS (and iOS) and so it's a little know gem. It is the front end to Apple's ICU compatible string transformation engine. It can perform real magic like transliterations between greek and latin (or about any known scripts), but it can also be used to do mundane tasks like unescaping strings from a crappy server:

NSString *input = @"\\u5404\\u500b\\u90fd";NSString *convertedString = [input mutableCopy];CFStringRef transform = CFSTR("Any-Hex/Java");CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES);NSLog(@"convertedString: %@", convertedString);// prints: 各個都, tada!

As I said, CFStringTransform is really powerful. It supports a number of predefined transforms, like case mappings, normalizations or unicode character name conversion. You can even design your own transformations.

~~I have no idea why Apple does not make it available from Cocoa.~~

Edit 2015:

OS X 10.11 and iOS 9 add the following method to Foundation:

- (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;

So the example from above becomes...

NSString *input = @"\\u5404\\u500b\\u90fd";NSString *convertedString = [input stringByApplyingTransform:@"Any-Hex/Java"                                                     reverse:YES];NSLog(@"convertedString: %@", convertedString);

Thanks @nschmidt for the heads up.

objective-c cocoa unicode

There is no built-in function to do C unescaping.

You can cheat a little with NSPropertyListSerialization since an "old text style" plist supports C escaping via \Uxxxx:

NSString* input = @"ab\"cA\"BC\\u2345\\u0123";// will cause trouble if you have "abc\\\\uvw"NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\\u" withString:@"\\U"];NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""];NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""];NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding];NSString* unesc = [NSPropertyListSerialization propertyListFromData:data                   mutabilityOption:NSPropertyListImmutable format:NULL                   errorDescription:NULL];assert([unesc isKindOfClass:[NSString class]]);NSLog(@"Output = %@", unesc);

but mind that this isn't very efficient. It's far better if you write up your own parser. (BTW are you decoding JSON strings? If yes you could use the existing JSON parsers.)

objective-c cocoa unicode

Here's what I ended up writing. Hopefully this will help some people along.

+ (NSString*) unescapeUnicodeString:(NSString*)string{// unescape quotes and backwards slashNSString* unescapedString = [string stringByReplacingOccurrencesOfString:@"\\\"" withString:@"\""];unescapedString = [unescapedString stringByReplacingOccurrencesOfString:@"\\\\" withString:@"\\"];// tokenize based on unicode escape charNSMutableString* tokenizedString = [NSMutableString string];NSScanner* scanner = [NSScanner scannerWithString:unescapedString];while ([scanner isAtEnd] == NO){    // read up to the first unicode marker    // if a string has been scanned, it's a token    // and should be appended to the tokenized string    NSString* token = @"";    [scanner scanUpToString:@"\\u" intoString:&token];    if (token != nil && token.length > 0)    {        [tokenizedString appendString:token];        continue;    }    // skip two characters to get past the marker    // check if the range of unicode characters is    // beyond the end of the string (could be malformed)    // and if it is, move the scanner to the end    // and skip this token    NSUInteger location = [scanner scanLocation];    NSInteger extra = scanner.string.length - location - 4 - 2;    if (extra < 0)    {        NSRange range = {location, -extra};        [tokenizedString appendString:[scanner.string substringWithRange:range]];        [scanner setScanLocation:location - extra];        continue;    }    // move the location pas the unicode marker    // then read in the next 4 characters    location += 2;    NSRange range = {location, 4};    token = [scanner.string substringWithRange:range];    unichar codeValue = (unichar) strtol([token UTF8String], NULL, 16);    [tokenizedString appendString:[NSString stringWithFormat:@"%C", codeValue]];    // move the scanner past the 4 characters    // then keep scanning    location += 4;    [scanner setScanLocation:location];}// donereturn tokenizedString;}+ (NSString*) escapeUnicodeString:(NSString*)string{// lastly escaped quotes and back slash// note that the backslash has to be escaped before the quote// otherwise it will end up with an extra backslashNSString* escapedString = [string stringByReplacingOccurrencesOfString:@"\\" withString:@"\\\\"];escapedString = [escapedString stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""];// convert to encoded unicode// do this by getting the data for the string// in UTF16 little endian (for network byte order)NSData* data = [escapedString dataUsingEncoding:NSUTF16LittleEndianStringEncoding allowLossyConversion:YES];size_t bytesRead = 0;const char* bytes = data.bytes;NSMutableString* encodedString = [NSMutableString string];// loop through the byte array// read two bytes at a time, if the bytes// are above a certain value they are unicode// otherwise the bytes are ASCII characters// the %C format will write the character value of byteswhile (bytesRead < data.length){    uint16_t code = *((uint16_t*) &bytes[bytesRead]);    if (code > 0x007E)    {        [encodedString appendFormat:@"\\u%04X", code];    }    else    {        [encodedString appendFormat:@"%C", code];    }    bytesRead += sizeof(uint16_t);}// donereturn encodedString;}

CodeHunter

Using Objective C/Cocoa to unescape unicode characters, ie \u1234

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last