How can I use PHP's various XML libraries to get DOM-like functionality and avoid DoS vulnerabilities, like Billion Laughs or Quadratic Blowup? How can I use PHP's various XML libraries to get DOM-like functionality and avoid DoS vulnerabilities, like Billion Laughs or Quadratic Blowup? php php

How can I use PHP's various XML libraries to get DOM-like functionality and avoid DoS vulnerabilities, like Billion Laughs or Quadratic Blowup?


Note: If you create test-cases with files that contain the XML chunks in the following, expect that editors might be prone to these attacks as well and might freeze/crash.

Billion laugh

<?xml version="1.0"?><!DOCTYPE lolz [  <!ENTITY lol "lol">  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">]><lolz>&lol9;</lolz>

When loading:

FATAL: #89: Detected an entity reference loop 1:7
... (plus six times the same = seven times total with above)
FATAL: #89: Detected an entity reference loop 14:13

Result:

<?xml version="1.0"?>

Memory usage is light, the peak not touched by DOMDocument. As this example shows 7 fatal errors, one can conclude and indeed it is so that this loads w/o errors:

<?xml version="1.0"?><!DOCTYPE lolz [  <!ENTITY lol "lol">  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">]><lolz>&lol2;</lolz>

As entity substitution is not in effect and this work, let's try with

Quadratic Blowup

That is this one here, shortened for your viewing pleasure (my variants are about 27/11kb):

<?xml version="1.0"?><!DOCTYPE kaboom [  <!ENTITY a "aaaaaaaaaaaaaaaaaa...">]><kaboom>&a;&a;&a;&a;&a;&a;&a;&a;&a;...</kaboom>

If you use $doc->loadXML($src, LIBXML_NOENT); this does work as an attack, while I write this, the script is still loading ... . So this actually takes some time to load and consumes memory. Something you can play with your own. W/o LIBXML_NOENT it works flawlessly and fast.

But there is a caveat, if you obtain the nodeValue of a tag for example, you will get the entities expanded even if you don't use that loading flag.

A workaround for this issue is to remove the DocumentType node from the document. Note the following code:

$doc = new DOMDocument();$doc->loadXML($s); // where $s is a Quadratic attack xml string above.// now remove the doctype nodeforeach ($doc->childNodes as $child) {    if ($child->nodeType===XML_DOCUMENT_TYPE_NODE) {        $doc->removeChild($child);        break;    }}// Now the following is true:assert($doc->doctype===NULL);assert($doc->lastChild->nodeValue==='...');// Note that entities remain unexpanded in the output XML// This is not so good since this makes the XML invalid.// Better is a manual walk through all nodes looking for XML_ENTITY_NODEassert($doc->saveXML()==="<?xml version="1.0"?>\n<kaboom>&a;&a;&a;&a;&a;&a;&a;&a;&a;...</kaboom>\n");// however, canonicalization will produce warnings because it must resolve entitiesassert($doc->C14N()===False);// Warning will be like://    PHP Warning:  DOMNode::C14N(): Node XML_ENTITY_REF_NODE is invalid here 

So while this workaround will prevent an XML document from consuming resources in a DoS, it makes it easy to generate invalid XML.

Some figures (I reduced the file-size otherwise it takes too long) (code):

LIBXML_NOENT disabled                                          LIBXML_NOENT enabledMem: 356 184 (Peak: 435 464)                                   Mem: 356 280 (Peak: 435 464)                             Loaded file quadratic-blowup-2.xml into string.                Loaded file quadratic-blowup-2.xml into string.          Mem: 368 400 (Peak: 435 464)                                   Mem: 368 496 (Peak: 435 464)                             DOMDocument loaded XML 11 881 bytes in 0.001368 secs.          DOMDocument loaded XML 11 881 bytes in 15.993627 secs.   Mem: 369 088 (Peak: 435 464)                                   Mem: 369 184 (Peak: 435 464)                             Removed load string.                                           Removed load string.                                     Mem: 357 112 (Peak: 435 464)                                   Mem: 357 208 (Peak: 435 464)                             Got XML (saveXML()), length: 11 880                            Got XML (saveXML()), length: 11 165 132                  Got Text (nodeValue), length: 11 160 314; 11.060893 secs.      Got Text (nodeValue), length: 11 160 314; 0.025360 secs. Mem: 11 517 776 (Peak: 11 532 016)                             Mem: 11 517 872 (Peak: 22 685 360)                       

I have not made up my mind so far about protection strategies but now know that loading the billion laugh into PHPStorm will freeze it for example and I stopped testing the later as I didn't wanted to freeze it while writing this.


You should actually test your application with sample documents and see if it is vulnerable.

The underlying library for php's xml libraries is libxml2. It's behavior is controlled from php mostly through optional constants which most libraries will accept as an argument when loading the xml.

You can determine your php's libxml2 version with echo LIBXML_DOTTED_VERSION;

In later versions (after 2.6), libxml2 contains entity substitution limits designed to prevent both exponential and quadratic attacks. These can be overridden with the LIBXML_PARSEHUGE option.

By default libxml2 does not load a dtd, add default attributes, or perform entity substitution. So the default behavior is to ignore dtds.

You can turn parts of this on like so:

  • LIBXML_DTDLOAD will load dtds.
  • LIBXML_NONET will disable network-loading of dtds. You should always have this on and use libxml's dtd catalog to load dtds.
  • LIBXML_DTDVALID will perform dtd validation while parsing.
  • LIBXML_NOENT will perform entity substitution.
  • LIBXML_DTDATTR will add default attributes.

So using the default settings PHP/libxml2 are probably not vulnerable to any of these issues, but the only way to know for sure is to test.