JavaScript validation issue with international characters JavaScript validation issue with international characters jquery jquery

JavaScript validation issue with international characters


I think the email and url validation methods are a good reference here, eg. the email method:

email: function(value, element) {    return this.optional(element) || /^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i.test(value);},

The script to compile that regex.

In other words, replacing your arbitrary list of "crazy moon" characters with this could help:

[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]

Basically this avoids the character encoding issues you have elsewhere by replacing the needs-encoding characters with more general definitions. While not necessarily more readable, so far it's shorter than your full list.


This isn't really an answer but I don't have 50 rep yet to add a comment... It can definately be attributed to encoding issues.

Yea "ECMA shouldn't care about encoding..." blah blah, well if you're on firefox, go to View > Character Encoding > Western (ISO-8859-1) then try using the Name field.

It works fine for me after changing the encoding manually (granted the rest of the page doesn't like the encoding switch, :P)

(on IE8 you can go to Page > Encoding > Western European (Windows) to get the same effect)


What is the character encoding of the JS file?

For XML QNames I use this RegExp:

/** * Definition of an XML Name */var NameStartChar = "A-Za-z:_\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D"+                    "\u037F-\u1FFF\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF"+                    "\uF900-\uFDCF\uFDF0-\uFFFD\u010000-\u0EFFFF";var NameChar = NameStartChar+"\\-\\.0-9\u00B7\u0300-\u036F\u203F-\u2040";var Name = "^["+NameStartChar+"]["+NameChar+"]*$";RegExp (Name).test (value);

It works like a charm also with internationalized characters. Note the escaping. Due to that I'm able to restrict the JS file to ASCII characters only. Therefore I don't get into trouble when dealing with ISO-8859 vs UTF-8 charsets.

This is no more true, if you use character encodings where ASCII is no real subset (like, e.g., in Asia UTF-16).

Cheers,