NSString - Convert to pure alphabet only (i.e. remove accents+punctuation) NSString - Convert to pure alphabet only (i.e. remove accents+punctuation) objective-c objective-c

NSString - Convert to pure alphabet only (i.e. remove accents+punctuation)


NSString* finish = [[start componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:@""];


Before using any of these solutions, don't forget to use decomposedStringWithCanonicalMapping to decompose any accented letters. This will turn, for example, é (U+00E9) into e ‌́ (U+0065 U+0301). Then, when you strip out the non-alphanumeric characters, the unaccented letters will remain.

The reason why this is important is that you probably don't want, say, “dän” and “dün”* to be treated as the same. If you stripped out all accented letters, as some of these solutions may do, you'll end up with “dn”, so those strings will compare as equal.

So, you should decompose them first, so that you can strip the accents and leave the letters.

*Example from German. Thanks to Joris Weimar for providing it.


On a similar question, Ole Begemann suggests using stringByFoldingWithOptions: and I believe this is the best solution here:

NSString *accentedString = @"ÁlgeBra";NSString *unaccentedString = [accentedString stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale currentLocale]];

Depending on the nature of the strings you want to convert, you might want to set a fixed locale (e.g. English) instead of using the user's current locale. That way, you can be sure to get the same results on every machine.