Weird behavior of BASH glob/regex ranges
It certainly had to do with setting of your locale
. An excerpt from the GNU bash man page under Pattern Matching
[..] in the default
C
locale,[a-dx-z]
is equivalent to[abcdxyz]
. Many locales sort characters in dictionary order, and in these locales[a-dx-z]
is typically not equivalent to[abcdxyz]
; it might be equivalent to[aBbCcDdxXyYz]
, for example. To obtain the traditional interpretation of ranges in bracket expressions, you can force the use of the C locale by setting theLC_COLLATE
orLC_ALL
environment variable to the valueC
, or enable theglobasciiranges
shell option.[..]
Use the POSIX
character-classess, [[:upper:]]
in this case or change your locale
setting LC_ALL
or LC_COLLATE
to C
as mentioned above.
LC_ALL=C var='ABCDabcd0123'echo "${var//[A-Z]/}"abcd0123
Also, your negative test to do upper-case check will fail for all the lower case letters when setting this locale hence printing the letters,
LC_ALL=C; for l in {a..z}; do [[ $l =~ [A-Z] ]] || echo $l; done
Also, under the above locale setting
[[ a =~ [A-Z] ]] ; echo $?1[[ b =~ [A-Z] ]] ; echo $?1
but will be true for all lower-case ranges,
[[ a =~ [a-z] ]] ; echo $?0[[ b =~ [a-z] ]] ; echo $?0
Said this, all these can be avoided by using the POSIX
specified character classes, under a new shell without any locale
setting,
echo "${var//[[:upper:]]/}"abcd0123
and
for l in {a..z}; do [[ $l =~ [[:upper:]] ]] || echo $l; done