Weird behavior of BASH glob/regex ranges Weird behavior of BASH glob/regex ranges shell shell

Weird behavior of BASH glob/regex ranges


It certainly had to do with setting of your locale. An excerpt from the GNU bash man page under Pattern Matching

[..] in the default C locale, [a-dx-z] is equivalent to [abcdxyz]. Many locales sort characters in dictionary order, and in these locales [a-dx-z] is typically not equivalent to [abcdxyz]; it might be equivalent to [aBbCcDdxXyYz], for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting the LC_COLLATE or LC_ALL environment variable to the value C, or enable the globasciiranges shell option.[..]

Use the POSIX character-classess, [[:upper:]] in this case or change your locale setting LC_ALL or LC_COLLATE to C as mentioned above.

LC_ALL=C var='ABCDabcd0123'echo "${var//[A-Z]/}"abcd0123

Also, your negative test to do upper-case check will fail for all the lower case letters when setting this locale hence printing the letters,

LC_ALL=C; for l in {a..z}; do [[ $l =~ [A-Z] ]] || echo $l; done

Also, under the above locale setting

[[ a =~ [A-Z] ]] ; echo $?1[[ b =~ [A-Z] ]] ; echo $?1

but will be true for all lower-case ranges,

[[ a =~ [a-z] ]] ; echo $?0[[ b =~ [a-z] ]] ; echo $?0

Said this, all these can be avoided by using the POSIX specified character classes, under a new shell without any locale setting,

echo "${var//[[:upper:]]/}"abcd0123

and

for l in {a..z}; do [[ $l =~ [[:upper:]] ]] || echo $l; done