Sanitize/Rewrite HTML on the Client Side

javascript html security html-sanitizing

Update 2016: There is now a Google Closure package based on the Caja sanitizer.

It has a cleaner API, was rewritten to take into account APIs available on modern browsers, and interacts better with Closure Compiler.

Shameless plug: see caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.

It is white-listed, not black-listed, but the whitelists are configurable as per CajaWhitelists

If you want to remove all tags, then do the following:

var tagBody = '(?:[^"\'>]|"[^"]*"|\'[^\']*\')*';var tagOrComment = new RegExp(    '<(?:'    // Comment body.    + '!--(?:(?:-*[^->])*--+|-?)'    // Special "raw text" elements whose content should be elided.    + '|script\\b' + tagBody + '>[\\s\\S]*?</script\\s*'    + '|style\\b' + tagBody + '>[\\s\\S]*?</style\\s*'    // Regular name    + '|/?[a-z]'    + tagBody    + ')>',    'gi');function removeTags(html) {  var oldHtml;  do {    oldHtml = html;    html = html.replace(tagOrComment, '');  } while (html !== oldHtml);  return html.replace(/</g, '<');}

People will tell you that you can create an element, and assign innerHTML and then get the innerText or textContent, and then escape entities in that. Do not do that. It is vulnerable to XSS injection since <img src=bogus onerror=alert(1337)> will run the onerror handler even if the node is never attached to the DOM.

javascript html security html-sanitizing

The Google Caja HTML sanitizer can be made "web-ready" by embedding it in a web worker. Any global variables introduced by the sanitizer will be contained within the worker, plus processing takes place in its own thread.

For browsers that do not support Web Workers, we can use an iframe as a separate environment for the sanitizer to work in. Timothy Chien has a polyfill that does just this, using iframes to simulate Web Workers, so that part is done for us.

The Caja project has a wiki page on how to use Caja as a standalone client-side sanitizer:

Checkout the source, then build by running ant
Include html-sanitizer-minified.js or html-css-sanitizer-minified.js in your page
Call html_sanitize(...)

The worker script only needs to follow those instructions:

importScripts('html-css-sanitizer-minified.js'); // or 'html-sanitizer-minified.js'var urlTransformer, nameIdClassTransformer;// customize if you need to filter URLs and/or ids/names/classesurlTransformer = nameIdClassTransformer = function(s) { return s; };// when we receive some HTMLself.onmessage = function(event) {    // sanitize, then send the result back    postMessage(html_sanitize(event.data, urlTransformer, nameIdClassTransformer));};

(A bit more code is needed to get the simworker library working, but it's not important to this discussion.)

Demo: https://dl.dropbox.com/u/291406/html-sanitize/demo.html

javascript html security html-sanitizing

Never trust the client. If you're writing a server application, assume that the client will always submit unsanitary, malicious data. It's a rule of thumb that will keep you out of trouble. If you can, I would advise doing all validation and sanitation in server code, which you know (to a reasonable degree) won't be fiddled with. Perhaps you could use a serverside web application as a proxy for your clientside code, which fetches from the 3rd party and does sanitation before sending it to the client itself?

[edit] I'm sorry, I misunderstood the question. However, I stand by my advice. Your users will probably be safer if you sanitize on the server before sending it to them.

CodeHunter

Sanitize/Rewrite HTML on the Client Side

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last