What does LIBXML_NOENT do (and why isn't it called LIBXML_ENT)?
Q: What exactly does the LIBXML_NOENT flag do?
The flag enables the substitution of XML character entity references, external or not.
Q: Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?
The name is indeed misleading. I think that NOENT
simply means that the node tree of the parsed document won't contain any entity nodes, so the parser will substitute entities. Without NOENT
, the parser creates DOMEntityReference nodes for entity references.
Q: Is there a flag that actually prevents the parsing of all entities?
LIBXML_NOENT
enables the substitution of all entity references. If you don't want entities to be expanded, simply omit the flag. For example
$xml = '<!DOCTYPE test [<!ENTITY c "TEST">]><test>&c;</test>';$dom = new DOMDocument();$dom->loadXML($xml);echo $dom->saveXML();
prints
<?xml version="1.0"?><!DOCTYPE test [<!ENTITY c "TEST">]><test>&c;</test>
It seems that textContent
replaces entities on its own which might be a peculiarity of the PHP bindings. Without LIBXML_NOENT
, it leads to different behavior for internal and external entities because the latter won't be loaded.