Using encodeURI() vs. escape() for utf-8 strings in JavaScript

javascript unicode utf-8 escaping encode

Hi!

When it comes to escape and unescape, I live by two rules:

Avoid them when you easily can.
Otherwise, use them.

Avoiding them when you easily can:

As mentioned in the question, both escape and unescape have been deprecated. In general, one should avoid using deprecated functions.

So, if encodeURIComponent or encodeURI does the trick for you, you should use that instead of escape.

Using them when you can't easily avoid them:

Browsers will, as far as possible, strive to achieve backwards compatibility. All major browsers have already implemented escape and unescape; why would they un-implement them?

Browsers would have to redefine escapeand unescape if the new specification requires them to do so. But wait! The people who write specifications are quite smart. They too, are interested in not breaking backwards compatibility!

I realize that the above argument is weak. But trust me, ... when it comes to browsers, deprecated stuff works. This even includes deprecated HTML tags like <xmp> and <center>.

Using `escape` and `unescape`:

So naturally, the next question is, when would one use escape or unescape?

Recently, while working on CloudBrave, I had to deal with utf8, latin1 and inter-conversions.

After reading a bunch of blog posts, I realized how simple this was:

var utf8_to_latin1 = function (s) {    return unescape(encodeURIComponent(s));};var latin1_to_utf8 = function (s) {    return decodeURIComponent(escape(s));};

These inter-conversions, without using escape and unescape are rather involved. By not avoiding escape and unescape, life becomes simpler.

Hope this helps.

javascript unicode utf-8 escaping encode

Mozilla says that escape() is deprecated.

Yes, you should avoid both escape() and unescape()

Simply put, is it okay to use encodeURI() and decodeURI() for utf-8 strings?

Yes, but depending on the form of your input and the required form of your output you may need some extra work.

From your question I assume you have a JavaScript string and you want to convert encoding to UTF-8 and finally store the string in some escaped form.

First of all it's important to note that JavaScript strings enconding is UCS-2, similar to UTF-16, different from UTF-8.

See: https://mathiasbynens.be/notes/javascript-encoding

encodeURIComponent() is good for the job as turns the UCS-2 JavaScript string into UTF-8 and escapes it in the form a sequence of %nn substrings where each nn is the two hex digits of each byte.

However encodeURIComponent() does not escape letters, digits and few other characters in the ASCII range. But this is easy to fix.

For example, if you want to turn a JavaScript string into an array of numbers representing the bytes of the original string UTF-8 encoded you may use this function:

//// Convert JavaScript UCS2 string to array of bytes representing the string UTF8 encoded//function StringUTF8AsBytesArrayFromString( s ){    var i,        n,        u;    u = [];    s = encodeURIComponent( s );    n = s.length;    for( i = 0; i < n; i++ )    {        if( s.charAt( i ) == '%' )        {            u.push( parseInt( s.substring( i + 1, i + 3 ), 16 ) );            i += 2;        }        else        {            u.push( s.charCodeAt( i ) );        }    }    return u;}

If you want to turn the string in its hexadecimal representation:

//// Convert JavaScript UCS2 string to hex string representing the bytes of the string UTF8 encoded//function StringUTF8AsHexFromString( s ){    var u,        i,        n,        s;    u = StringUTF8AsBytesArrayFromString( s );    n = u.length;    s = '';        for( i = 0; i < n; i++ )    {        s += ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );    }    return s;}

If you change the line in the for loop into

s += '%' + ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );

(adding the % sign before each hex digit)

The resulting escaped string (UTF-8 encoded) may be turned back into a JavaScript UCS-2 string with decodeURIComponent()

javascript unicode utf-8 escaping encode

It is never okay to use encodeURI() or encodeURIComponent(). Let's try it out:

console.log(encodeURIComponent('@#*'));

Expand snippet

Using encodeURI() vs. escape() for utf-8 strings in JavaScript

Avoiding them when you easily can:

Using them when you can't easily avoid them:

Using escape and unescape:

Using `escape` and `unescape`: