How can I prevent XML::XPath from fetching a DTD while processing an XML file? How can I prevent XML::XPath from fetching a DTD while processing an XML file? xml xml

How can I prevent XML::XPath from fetching a DTD while processing an XML file?


XML::XPath is based on XML::Parser. There is an option in XML::Parser to NOT use LWP to resolve external entities (such as DTDs). And XML::XPath lets you pass an XML::Parser objetc, to use as the parser.

So you can write this:

my $p = XML::Parser->new( NoLWP => 1);my $xp= XML::XPath->new( parser => $p, filename => "a.xhtml");

Note that in this case you will loose all entities except numerical ones and the default ones (>, <, &, ' and "). The parser will not complain, but they will disappear silently (try including α in the table and printing it for example).

As a matter of fact you probably should not use XML::XPath, which is not actively maintained.

Try XML::LibXML, if you have no problem with installing libxml2, its interface is very similar to XML::XPath as they both implement the DOM. XML::LibXML is also much more powerful than XML::XPath, and faster to boot. If you want an expat/XML::Parser based module, they you might want to have a look at XML::Twig (that's blatant self-promotion as I am the author of the module, sorry). Also for HTML/dodgy XHTML, you can use HTML::TreeBuilder, which, with the addition of HTML::TreeBuilder::XPath (also by me), supports XPath.


porneL's response seems to be the Right Thing here. (www.w3.org has started taking 30 seconds to respond to each of my queries (when it doesn't just give up), and when XML::XPath ends up retrieving the full XHTML set…!) Further, mirod's idea works, too:

use XML::XPath;use XML::Catalog;my $parser = new XML::Parser;my $catalog_handler = new XML::Catalog("xhtml1-20020801/DTD/xhtml.soc")->get_handler($parser);$parser->setHandlers("ExternEnt" => $catalog_handler);my $xp = new XML::XPath(xml => $xml, parser => $parser);

Add a copy of "The complete set of DTD files together with an XML declaration and SGML Open Catalog" from ⟨URL:http://www.w3.org/TR/xhtml1/dtds.html⟩ and enjoy!


Usually it's done by setting up local XML catalog.

libxml-based parsers support it, so if you follow mirod's advice, you'll be able to get named entities and validation work without network access.