How to handle GET parameters containing non-utf8 characters? How to handle GET parameters containing non-utf8 characters? express express

How to handle GET parameters containing non-utf8 characters?


Well URL encoding should always be in UTF-8, other cases can be treated as encoding attack and just reject the request. There is no suchthing as a non-utf8 character. I don't know why your application could get query strings in any encoding but you will be fine with browsers if you just use a charsetheader on your pages. For API requests or whatever, you can specify UTF-8 and reject invalid UTF-8 as Bad Request.

If you really mean ISO-8859-1, then it's very simple because the bytes match unicode code points exactly.

'T%FCt%20T%FCt'.replace( /%([a-f0-9]{2})/gi, function( f, m1 ) {    return String.fromCharCode(parseInt(m1, 16));});

Although it is probably never ISO-8859-1 on the web but Windows-1252 actually.


Maybe node-iconv is a solution. Do you know before hand which encoding is used?

var qs = require('qs');var Buffer = require('buffer').Buffer;var Iconv  = require('iconv').Iconv;var parsed = qs.parse('foo=bar&xyz=T%FCt%20T%FCt');var iconv = new Iconv('ISO-8859-1', 'UTF-8');var buffer = iconv.convert(parsed.xyz);var xyz = buffer.toString();