Sequence of logical OR in ES6/Unicode regular expression in Chrome ✗ vs Firefox ✓
Without the u
flag, your regexp works, and this is no wonder, since in the BMP (=no "u") mode it compares 16-bit "units" to 16-bit "units", that is, a surrogate pair to another surrogate pair.
The behaviour in the "u" mode (which is supposed to compare codepoints and not units) looks indeed like a Chrome bug, in the meantime you can enclose each alternative in a group, which seems to work fine:
m = '🍤🍦🍋🍋🍦🍤'.match(/(🍤)|(🍦)|(🍋)/ug)console.log(m)// note that the groups must be capturing!// this doesn't work:m = '🍤🍦🍋🍋🍦🍤'.match(/(?:🍤)|(?:🍦)|(?:🍋)/ug)console.log(m)
And here's a quick proof that more than two SMP alternatives are broken in the u
mode:
// insert a whatever range // from https://en.wikipedia.org/wiki/Plane_(Unicode)#Supplementary_Multilingual_Planevar range = '11300-1137F';range = range.split('-').map(x => parseInt(x, 16))var chars = [];for (var i = range[0]; i <= range[1]; i++) { chars.push(String.fromCodePoint(i))}var str = chars.join('');while(chars.length) { var re = new RegExp(chars.join('|'), 'u') if(str.match(re)) console.log(chars.length, re); chars.pop();}
In Chrome, it only logs the last two regexes (2 and 1 alts).
without the "u"-flag it does also work in chrome (52.0.2743.116) for me
well u
-flag seems to be broken
unless you use multiplier
'🍤🍤🍦🍦🍦🍦🍋🍋🍋🍋🍦🍦🍦🍦🍤🍤'.match(/🍤|🍦{2}|🍋/g)
-> null{1}
and{1,}
seem to work, I assume they are translated into ? and +. I assume without the "u"-flag🍦{2}
is interpreted as\ud83c\udf66{2}
, wich would explain the behaviour.
just tested with (?:🍦){2}
this seems to work right. I guess this confirms my assumption about the multiplier.
here a quick fix for that:
//a utility I usually have in my codesvar replace = (pattern, replacement) => value => String(value).replace(pattern, replacement);var fixRegexSource = replace( /[\ud800-\udbff][\udc00-\udfff]/g, //"(?:$&)" //not sure wether this might still be buggy //that's why I convert it into the unicode-syntax, //this can't be misinterpreted c => `(?:\\u${c.charCodeAt(0).toString(16)}\\u${c.charCodeAt(1).toString(16)})`);var fixRegex = regex => new RegExp( fixRegexSource(regex.source), regex.flags.replace("u", ""));
sry, didn't come up with better function-names