HTML-encoding lost when attribute read from input field

javascript jquery html escaping html-escape-characters

EDIT: This answer was posted a long ago, and the htmlDecode function introduced a XSS vulnerability. It has been modified changing the temporary element from a div to a textarea reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.

I use these functions:

function htmlEncode(value){  // Create a in-memory element, set its inner text (which is automatically encoded)  // Then grab the encoded contents back out. The element never exists on the DOM.  return $('<textarea/>').text(value).html();}function htmlDecode(value){  return $('<textarea/>').html(value).text();}

Basically a textarea element is created in memory, but it is never appended to the document.

On the htmlEncode function I set the innerText of the element, and retrieve the encoded innerHTML; on the htmlDecode function I set the innerHTML value of the element and the innerText is retrieved.

Check a running example here.

javascript jquery html escaping html-escape-characters

The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.

Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.

It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.

function htmlEscape(str) {    return str        .replace(/&/g, '&')        .replace(/"/g, '"')        .replace(/'/g, '&#39;')        .replace(/</g, '<')        .replace(/>/g, '>');}// I needed the opposite function today, so adding here too:function htmlUnescape(str){    return str        .replace(/"/g, '"')        .replace(/&#39;/g, "'")        .replace(/</g, '<')        .replace(/>/g, '>')        .replace(/&/g, '&');}

Update 2013-06-17:
In the search for the fastest escaping I have found this implementation of a replaceAll method:
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
(also referenced here: Fastest method to replace all instances of a character in a string)
Some performance results here:
http://jsperf.com/htmlencoderegex/25

It gives identical result string to the builtin replace chains above. I'd be very happy if someone could explain why it's faster!?

Update 2015-03-04:
I just noticed that AngularJS are using exactly the method above:
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435

They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.

I will note that (4 years later) Django still does not do either of these things, so I'm not sure how important they are:
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44

Update 2016-04-06:
You may also wish to escape forward-slash /. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to @JNF for suggesting this in comments)

        .replace(/\//g, '&#x2F;');

javascript jquery html escaping html-escape-characters

Here's a non-jQuery version that is considerably faster than both the jQuery .html() version and the .replace() version. This preserves all whitespace, but like the jQuery version, doesn't handle quotes.

function htmlEncode( html ) {    return document.createElement( 'a' ).appendChild(         document.createTextNode( html ) ).parentNode.innerHTML;};

Speed: http://jsperf.com/htmlencoderegex/17

Demo:

Output:

output

Script:

function htmlEncode( html ) {    return document.createElement( 'a' ).appendChild(         document.createTextNode( html ) ).parentNode.innerHTML;};function htmlDecode( html ) {    var a = document.createElement( 'a' ); a.innerHTML = html;    return a.textContent;};document.getElementById( 'text' ).value = htmlEncode( document.getElementById( 'hidden' ).value );//sanity checkvar html = '<div>   & hello</div>';document.getElementById( 'same' ).textContent =       'html === htmlDecode( htmlEncode( html ) ): '     + ( html === htmlDecode( htmlEncode( html ) ) );

HTML:

<input id="hidden" type="hidden" value="chalk    & cheese" /><input id="text" value="" /><div id="same"></div>

CodeHunter

HTML-encoding lost when attribute read from input field

Output:

Script:

HTML:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last