Comparing UTF-8 String Comparing UTF-8 String php php

Comparing UTF-8 String


IMPORTANT

This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.

Sorting by non-accented characters in PHP 5.2

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

Then do the comparison

See the documentation here:

http://www.php.net/manual/en/function.iconv.php

[updated, in response to @Esailija's remark]I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().

<?phpsetLocale(LC_ALL, 'fr_FR');$names = array(   'Zoey and another (word) ',   'Émilie and another word',   'Amber',);$converted = array();foreach($names as $name) {    $converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));}sort($converted);echo '<pre>'; print_r($converted);// Array// (//     [0] => Amber//     [1] => Emilie and another word//     [2] => Zoey and another word // )


There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php

$c = new Collator('fr_FR');if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }


I recomend to use the usort function, to avoid modifying the values, and still compare them correctly.

Example:

<?phpsetLocale(LC_ALL, 'fr_FR');$names = [   'Zoey and another (word) ',   'Émilie and another word',   'Amber'];function compare(string $a, string $b) {    $a = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $a));    $b = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $b));    return strcmp($a, $b);}usort($names, 'compare');echo '<pre>';print_r($names);echo '</pre>';

with result:

Array(    [0] => Amber    [1] => Émilie and another word    [2] => Zoey and another (word) )