Sanitizing SVG using PHP
I am working with xml and PHP but I am not sure at all for your question. Please take it as an idea/suggestion, not more.
SimpleXML use libxml to load the xml content.http://www.php.net/manual/en/simplexml.requirements.php
You can disable the external entities using:
libxml_disable_entity_loader (TRUE)
http://www.php.net/manual/en/function.libxml-disable-entity-loader.php
before loading your file with simpleXML.
Then you could validate against SVG schema
http://us3.php.net/manual/en/domdocument.schemavalidate.phpor http://us3.php.net/manual/en/domdocument.validate.php
The only concern I would see is that svg could contain script element. http://www.w3.org/TR/SVG/script.html#ScriptElement
There information on 1.1 DTD here:http://www.w3.org/Graphics/SVG/1.1/DTD/svg-framework.modhttp://www.w3.org/TR/2003/REC-SVG11-20030114/REC-SVG11-20030114.pdf
You might provide a SVG DTD with a modified version of the script element or loop through elements to prevent the script element to be present.
It won't be perfect, but at least better than nothing.
You need to sanitize SVG using XML parser + whitelist.
Because SVG already has multiple ways to execute code and future extensions may add additional methods, you simply cannot blacklist "known dangerous" constructs. Whitelisting safe elements and attributes does work as long as you correctly handle all the XML corner cases (e.g. XSLT stylesheets, entity expansions, external entity references).
Example implementations: https://github.com/alnorris/SVG-Sanitizer/blob/master/SvgSanitizer.php (MIT license) or https://github.com/darylldoyle/svg-sanitizer (GPL v2 license)
More information about attack vectors that you have to consider while selecting which features you want to support:
- https://phabricator.wikimedia.org/T85850 (base64 encoded parts)
- https://www.slideshare.net/x00mario/the-image-that-called-me (different ways to execute code)
- https://www.blackhat.com/docs/us-14/materials/us-14-DeGraaf-SVG-Exploiting-Browsers-Without-Image-Parsing-Bugs.pdf (embedding HTML inside SVG, SVG can do pretty much anything any XML file can do and any HTML file can do, using SVG inside
<object>
allows JS from inside the SVG to execute in parent document) - https://bjornjohansen.no/svg-in-wordpress (filtering SVG is hard enough that even WordPress still does not have a good solution for user submitted SVG files)
- http://html5sec.org/?svg (list of some known SVG attacks by misusing different APIs)
- https://security.stackexchange.com/questions/26264
- https://blobfolio.com/2017/03/when-a-stranger-calls-sanitizing-svgs/ (different ways to encode stuff, clever use of whitespace to avoid detection, xml tricks)
You can use SVG Sanitize package: https://packagist.org/packages/enshrined/svg-sanitize
Has 500k installs on the date this answer is written.
use enshrined\svgSanitize\Sanitizer;// Create a new sanitizer instance$sanitizer = new Sanitizer();// Load the dirty svg$dirtySVG = file_get_contents('filthy.svg');// Pass it to the sanitizer and get it back clean$cleanSVG = $sanitizer->sanitize($dirtySVG);// Now do what you want with your clean SVG/XML data