PHP DOM textContent vs nodeValue?
I finally wanted to know the difference as well, so I dug into the source and found the answer; in most cases there will be no discernible difference, but there are a bunch of edge cases you should be aware of.
Both ->nodeValue
and ->textContent
are identical for the following classes (node types):
The ->nodeValue
property yields NULL
for the following classes (node types):
The ->textContent
property is non-existent for the following classes:
DOMNameSpaceNode
(not documented, but can be found with//namespace:*
selector)
The ->nodeValue
property is non-existent for the following classes:
See also: dom_node_node_value_read()
and dom_node_text_content_read()
Hope this will make sense:
$doc = DOMDocument::loadXML('<body><!-- test --><node attr="test1">old content<h1>test</h1></node></body>');var_dump($doc->textContent);var_dump($doc->nodeValue);var_dump($doc->firstChild->textContent);var_dump($doc->firstChild->nodeValue);
Output:
string(15) "old contenttest"NULLstring(15) "old contenttest"string(15) "old contenttest"
Because: nodeValue - The value of this node, depending on its type
Both textContent
and nodeValue
return unescaped text; i.e. <
becomes <
.
textContent
concatenates together all of the content of all children. This is an important distinction; for example, in Chrome the maximum length of nodeValue
is 65536 characters (not bytes); if you have already set the content of a node to something longer than that you will need to iterate child nodes if you want to use nodeValue
whereas textContent
will perform the concatenation for you.
As discussed, there are also several DOM classes that do not support nodeValue
but do support textContent
.
nodeValue
is faster for obvious reasons; however don't use it unless you know exactly what the node structure really is.