Natural sorting algorithm in PHP with support for Unicode? Natural sorting algorithm in PHP with support for Unicode? arrays arrays

Natural sorting algorithm in PHP with support for Unicode?


The question is not as easy to answer as it seems on the first look. This is one of the areas where PHP's lack of unicode supports hits you with full strength.

Frist of all natsort() as suggested by other posters has nothing to do with sorting arrays of the type you want to sort. What you're looking for is a locale aware sorting mechanism as sorting strings with extended characters is always a question of the used language. Let's take German for example: A and Ä can sometimes be sorted as if they were the same letter (DIN 5007/1), and sometimes Ä can be sorted as it was in fact "AE" (DIN 5007/2). In Swedish, in contrast, Ä comes at the end of the alphabet.

If you don't use Windows, you're lucky as PHP provides some functions to exactly this. Using a combination of setlocale(), usort(), strcoll() and the correct UTF-8 locale for your language, you get something like this:

$array = array('Àgile', 'Ágile', 'Âgile', 'Ãgile', 'Ägile', 'Agile', 'Test');$oldLocal = setlocale(LC_COLLATE, '<<your_RFC1766_language_code>>.utf8');usort($array, 'strcoll');setlocale(LC_COLLATE, $oldLocal);

Please note that it's mandatory to use the UTF-8 locale variant in order to sort UTF-8 strings. I reset the locale in the example above to its original value as setting a locale using setlocale() can introduce side-effects in other running PHP script - please see PHP manual for more details.

When you do use a Windows machine, there is currently no solution to this problem and there won't be any before PHP 6 I assume. Please see my own question on SO targeting this specific problem.


Nailed it!

$array = array('Ägile', 'Ãgile', 'Test', 'カタカナ', 'かたかな', 'Ágile', 'Àgile', 'Âgile', 'Agile');function Sortify($string){    return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1' . chr(255) . '$2', htmlentities($string, ENT_QUOTES, 'UTF-8'));}array_multisort(array_map('Sortify', $array), $array);

Output:

Array(    [0] => Agile    [1] => Ágile    [2] => Âgile    [3] => Àgile    [4] => Ãgile    [5] => Ägile    [6] => Test    [7] => かたかな    [8] => カタカナ)

Even better:

if (extension_loaded('intl') === true){    collator_asort(collator_create('root'), $array);}

Thanks to @tchrist!


natsort($array);$array = array_values($array);