PHP - HTML Purifier - hello w<o>rld/world tutorial striptags PHP - HTML Purifier - hello w<o>rld/world tutorial striptags php php

PHP - HTML Purifier - hello w<o>rld/world tutorial striptags


I've been using HTMLPurifier for sanitizing the output of a rich text editor, and ended up with:

include_once('htmlpurifier/library/HTMLPurifier.auto.php');$config = HTMLPurifier_Config::createDefault();$config->set('Core', 'Encoding', 'UTF-8');$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');if (defined('PURIFIER_CACHE')) {    $config->set('Cache', 'SerializerPath', PURIFIER_CACHE);} else {    # Disable the cache entirely    $config->set('Cache', 'DefinitionImpl', null);}# Help out the Purifier a bit, until it develops this functionalitywhile (($cleaner = preg_replace('!<(em|strong)>(\s*)</\1>!', '$2', $input)) != $input) {    $input = $cleaner;}$filter = new HTMLPurifier($config);$output = $filter->purify($input);

The main points of interest:

  1. Include the autoloader.
  2. Create an instance of HTMLPurifier_Config as $config.
  3. Set configuration settings as needed, with $config->set().
  4. Create an instance of HTMLPurifier, passing $config to it.
  5. Use $filter->purify() on your input.

However, it's entirely overkill for something that doesn't need to allow any HTML in the output.


You should do input validation based on the content - for example rather use some regexp for name

'/([A-Z][a-z]+[ ]?)+/' //ascii only, but not problematic to extend

this validation should do the job well. And then escape the output when printing it on page, with preferred htmlspecialchars.


You can use someting like htmlspecialchars() to preserve the characters the user typed in without the browser interpreting.