Why are spaces ignored in natsort / strnatcmp / strnatcasecmp? Why are spaces ignored in natsort / strnatcmp / strnatcasecmp? php php

Why are spaces ignored in natsort / strnatcmp / strnatcasecmp?


If you look in the source code you can actually see this, which definitely seems like a bug:http://gcov.php.net/PHP_5_3/lcov_html/ext/standard/strnatcmp.c.gcov.php (scroll down to line 130):

 //inside a while loop... /* Skip consecutive whitespace */ while (isspace((int)(unsigned char)ca)) {         ca = *++ap; } while (isspace((int)(unsigned char)cb)) {         cb = *++bp; }

Note that's a link to 5.3, but the same code still exists in 5.5 (http://gcov.php.net/PHP_5_5/lcov_html/ext/standard/strnatcmp.c.gcov.php)Admittedly my knowledge of C is limited, but this basically appears to be advancing the pointer on each string if the current character is a space, essentially ignoring that character in the sort. The comment implies that it's only doing this if the whitespaces are consecutive; however, there is no check to ensure the previous character was actually a space first. That would need something like

//declare these outside the loopshort prevAIsSpace = 0;short prevBIsSpace = 0;//....in the loopwhile (prevAIsSpace && isspace((int)(unsigned char)ca)) {    //won't get here the first time since prevAIsSpace == 0    ca = *++ap;}//now if the character is a space, flag it for the next iterationprevAIsSpace = isspace((int)(unsigned char)ca));//repeat with string bwhile (prevBIsSpace && isspace((int)(unsigned char)cb)) {    cb = *++bp;}prevBIsSpace = isspace((int)(unsigned char)cb));

Someone who actually knows C could probably write this better, but that's the general idea.

On another potentially interesting note, for your example, if you're using PHP >= 5.4, this gives the same result as the usort mentioned by Aaron Saray (it does lose the key/value associations as well):

sort($names, SORT_FLAG_CASE | SORT_STRING);print_r($names);Array (     [0] => Van de broecke     [1] => Van der Luizen     [2] => Van der Programma     [3] => vande Huizen     [4] => vande Kluizen     [5] => Vande Muizen     [6] => vander Muizen     [7] => Vander Veere     [8] => Vander Zoeker) 


Take a look at bugs.php.net #26412 (natsort() was compressing multiple spaces to 1 space). Apparently, this behavior is so "aa", "a a", and "a a" (note the 2 spaces) do not sort as identical strings.


Like other answers/commentors have said, this is a known issue. However, you can write your own sort with usort(). Please try this and see if it works:

usort($names2, function($first, $second) {    if ($first == $second) {        return 0;    }    else {        return (strtolower($first) < strtolower($second)) ? -1 : 1;}});

I noticed the output is slightly different than your suggested answer:

You suggested:

[4] => Van der Programma [8] => Van der Luizen

But I'm sure this was a typo - these should be swapped. :)