Is Scala/Java not respecting w3 "excess dtd traffic" specs? Is Scala/Java not respecting w3 "excess dtd traffic" specs? xml xml

Is Scala/Java not respecting w3 "excess dtd traffic" specs?


I've bumped into the SAME issue, and I haven't found an elegant solution (I'm thinking into posting the question to the Scala mailing list) Meanwhile, I found a workaround: implement your own SAXParserFactoryImpl so you can set the f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); property. The good thing is it doesn't require any code change to the Scala code base (I agree that it should be fixed, though).First I'm extending the default parser factory:

package mypackage;public class MyXMLParserFactory extends SAXParserFactoryImpl {      public MyXMLParserFactory() throws SAXNotRecognizedException, SAXNotSupportedException, ParserConfigurationException {        super();        super.setFeature("http://xml.org/sax/features/validation", false);        super.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false);         super.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);         super.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);       }     }

Nothing special, I just want the chance to set the property.

(Note: that this is plain Java code, most probably you can write the same in Scala too)

And in your Scala code, you need to configure the JVM to use your new factory:

System.setProperty("javax.xml.parsers.SAXParserFactory", "mypackage.MyXMLParserFactory");

Then you can call XML.load without validation


Without addressing, for now, the problem, what do you expect to happen if the function request return false below?

def fetchAndParseURL(URL:String) = {        val (true, body) = Http request(URL)

What will happen is that an exception will be thrown. You could rewrite it this way, though:

def fetchAndParseURL(URL:String) = (Http request(URL)) match {        case (true, body) =>          val xml = XML.load(body)    "True"  case _ => "False"}

Now, to fix the XML parsing problem, we'll disable DTD loading in the parser, as suggested by others:

def fetchAndParseURL(URL:String) = (Http request(URL)) match {        case (true, body) =>    val f = javax.xml.parsers.SAXParserFactory.newInstance()    f.setNamespaceAware(false)    f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);    val MyXML = XML.withSAXParser(f.newSAXParser())    val xml = MyXML.load(body)    "True"  case _ => "False"}

Now, I put that MyXML stuff inside fetchAndParseURL just to keep the structure of the example as unchanged as possible. For actual use, I'd separate it in a top-level object, and make "parser" into a def instead of val, to avoid problems with mutable parsers:

import scala.xml.Elemimport scala.xml.factory.XMLLoaderimport javax.xml.parsers.SAXParserobject MyXML extends XMLLoader[Elem] {  override def parser: SAXParser = {    val f = javax.xml.parsers.SAXParserFactory.newInstance()    f.setNamespaceAware(false)    f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);    f.newSAXParser()  }}

Import the package it is defined in, and you are good to go.


GClaramunt's solution worked wonders for me. My Scala conversion is as follows:

package mypackageimport org.xml.sax.{SAXNotRecognizedException, SAXNotSupportedException}import com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImplimport javax.xml.parsers.ParserConfigurationException@throws(classOf[SAXNotRecognizedException])@throws(classOf[SAXNotSupportedException])@throws(classOf[ParserConfigurationException])class MyXMLParserFactory extends SAXParserFactoryImpl() {    super.setFeature("http://xml.org/sax/features/validation", false)    super.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)    super.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)    super.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)}

As mentioned his the original post, it is necessary to place the following line in your code somewhere:

System.setProperty("javax.xml.parsers.SAXParserFactory", "mypackage.MyXMLParserFactory")