NSDiacriticInsensitiveSearch and arabic search NSDiacriticInsensitiveSearch and arabic search sqlite sqlite

NSDiacriticInsensitiveSearch and arabic search


You can use the regular expression to handle the Arabic (Alif) different shapes.

Assume that you have a context, that is "محمد بن إبراهيم الابراهيمي", and the pattern to search for is "إبراهيم", then you could convert the pattern to a regular expression that handles the differentiation between the "أ". The regular expression should be "(أ|إ|ا)بر(أ|إ|ا)هيم". This will search for the pattern by its all possible shapes.

Here is a simple code that I wrote:

#import <Foundation/Foundation.h>NSString * arabify(NSString * string){    NSRegularExpression * alifRegex = [NSRegularExpression regularExpressionWithPattern:@"(أ|ا|إ)" options:0 error:nil];    return [alifRegex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@"(أ|ا|إ)"];}int main(int argc, const char * argv[]){    @autoreleasepool {        NSString * context = @"محمد بن إبراهيم الابراهيمي";        NSString * pattern = @"إبراهيم";        // Get the regex for the Arabic word.        NSString * regex = arabify(pattern);        NSLog(@"context = %@", context);        NSLog(@"pattern = %@", pattern);        NSLog(@"regex = %@", regex);        NSRange range = [context rangeOfString:regex options:NSRegularExpressionSearch];        if (range.location == NSNotFound)        {            NSLog(@"Not found.");        }        else        {            NSLog(@"Found.");            NSLog(@"location = %lu, length = %lu", (unsigned long)range.location, (unsigned long)range.length);        }    }    return 0;}

Good luck brother.


It seems that you are using the compound symbol (U+0623), which does not collate with other representations of Alif.

Did you consider other encoding methods for the Alif? You could use the decomposed variant, which then would collate with the "plain" Alif (U+0627) just how you intend:

ARABIC LETTER ALEF (U+0627) ARABIC HAMZA ABOVE (U+0654)

See here: http://www.fileformat.info/info/unicode/char/0623/index.htm