PHP - HTML Purifier - hello w<o>rld/world tutorial striptags
I've been using HTMLPurifier for sanitizing the output of a rich text editor, and ended up with:
include_once('htmlpurifier/library/HTMLPurifier.auto.php');$config = HTMLPurifier_Config::createDefault();$config->set('Core', 'Encoding', 'UTF-8');$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');if (defined('PURIFIER_CACHE')) { $config->set('Cache', 'SerializerPath', PURIFIER_CACHE);} else { # Disable the cache entirely $config->set('Cache', 'DefinitionImpl', null);}# Help out the Purifier a bit, until it develops this functionalitywhile (($cleaner = preg_replace('!<(em|strong)>(\s*)</\1>!', '$2', $input)) != $input) { $input = $cleaner;}$filter = new HTMLPurifier($config);$output = $filter->purify($input);
The main points of interest:
- Include the autoloader.
- Create an instance of
HTMLPurifier_Config
as$config
. - Set configuration settings as needed, with
$config->set()
. - Create an instance of
HTMLPurifier
, passing$config
to it. - Use
$filter->purify()
on your input.
However, it's entirely overkill for something that doesn't need to allow any HTML in the output.
You should do input validation based on the content - for example rather use some regexp for name
'/([A-Z][a-z]+[ ]?)+/' //ascii only, but not problematic to extend
this validation should do the job well. And then escape the output when printing it on page, with preferred htmlspecialchars.
You can use someting like htmlspecialchars() to preserve the characters the user typed in without the browser interpreting.