How to strip a tag and all of its inner html using the tag's id?
Using the native XML Manipulation Library
Assuming that your html content is stored in the variable $html:
$html='<html> <body> bla bla bla bla <div id="myDiv"> more text <div id="anotherDiv"> And even more text </div> </div> bla bla bla </body></html>';
To delete the tag by ID use the following code:
$dom=new DOMDocument; $dom->validateOnParse = false; $dom->loadHTML( $html ); // get the tag $div = $dom->getElementById('anotherDiv'); // delete the tag if( $div && $div->nodeType==XML_ELEMENT_NODE ){ $div->parentNode->removeChild( $div ); } echo $dom->saveHTML();
Note that certain versions of libxml
require a doctype
to be present in order to use the getElementById
method.
In that case you can prepend $html with <!doctype>
$html = '<!doctype>' . $html;
Alternatively, as suggested by Gordon's answer, you can use DOMXPath
to find the element using the xpath:
$dom=new DOMDocument;$dom->validateOnParse = false;$dom->loadHTML( $html );$xp=new DOMXPath( $dom );$col = $xp->query( '//div[ @id="anotherDiv" ]' );if( !empty( $col ) ){ foreach( $col as $node ){ $node->parentNode->removeChild( $node ); }}echo $dom->saveHTML();
The first method works regardless the tag. If you want to use the second method with the same id but a different tag, let say form
, simply replace //div
in //div[ @id="anotherDiv" ]
by '//form
'