Parsing xml with DOM, DOCTYPE gets erased Parsing xml with DOM, DOCTYPE gets erased xml xml

Parsing xml with DOM, DOCTYPE gets erased


Your input XML is not valid. That should be:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><!DOCTYPE favoris [    <!ELEMENT favoris (station)+>    <!ELEMENT station (#PCDATA)>    <!ATTLIST station id ID #REQUIRED>]><favoris>    <station id="i5">test1</station>    <station id="i6">test1</station>    <station id="i8">test1</station></favoris>

As @DevNull wrote to be fully valid you can't write <station id="5">test1</station> (however for Java it works anyway even with that issue).


DOCTYPE is erased in output XML document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><favoris>    <station id="i5">new value</station>    <station id="i6">test1</station>    <station id="i8">test1</station></favoris>

I didn't find solution to missing DTD yet, but as workaround you can set external DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");

Result (example) document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><!DOCTYPE favoris SYSTEM "favoris.dtd"><favoris>    <station id="i5">new value</station>    <station id="i6">test1</station>    <station id="i8">test1</station></favoris>

EDIT:

I don't think it's possible to save inline DTD using Transformer class (vide here). If you can't use external DTD reference, then you can DOM Level 3 LSSerializer class instead:

DOMImplementationLS domImplementationLS =    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");LSOutput lsOutput = domImplementationLS.createLSOutput();FileOutputStream outputStream = new FileOutputStream("output.xml");lsOutput.setByteStream((OutputStream) outputStream);LSSerializer lsSerializer = domImplementationLS.createLSSerializer();lsSerializer.write(dom, lsOutput);outputStream.close();

Output with wanted DTD (I can't see any option to add standalone="yes" using LSSerializer...):

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE favoris [<!ELEMENT favoris (station)+><!ELEMENT station (#PCDATA)><!ATTLIST station id ID #REQUIRED>]><favoris>    <station id="i5">new value</station>    <station id="i6">test1</station>    <station id="i8">test1</station></favoris> 

Another approach is to use Apache Xerces2-J XMLSerializer class:

import org.apache.xml.serialize.OutputFormat;import org.apache.xml.serialize.XMLSerializer;...XMLSerializer serializer = new XMLSerializer();serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));OutputFormat format = new OutputFormat();format.setStandalone(true);serializer.setOutputFormat(format);serializer.serialize(dom);

Result:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><!DOCTYPE favoris [<!ELEMENT favoris (station)+><!ELEMENT station (#PCDATA)><!ATTLIST station id ID #REQUIRED>]><favoris>    <station id="i5">new value</station>    <station id="i6">test1</station>    <station id="i8">test1</station></favoris>


(This response is in a way only a supplement to @Grzegorz Szpetkowski's answer, why it works)

You lose the doctype definition because you use the Transform class which produces an XSL transformation. There is no DOCTYPE declaration or docytype definition object/node in XSLT tree model. When a parser hands over the document to an XSLT processor, the doctype info is lost and therefore cannot be retained or duplicated. XSLT offers some control over the serialization of the output tree, including adding an <!DOCTYPE ... > declaration with a public or system identifier. The values for these identifiers need to be known beforehand and cannot be read from the input tree. Creating or retaining an embedded DTD or entity declarations is also not supported (although one workaround for this obstacle is to output it as text with disable-output-escaping="yes").

In order to preserve the DTD you need to output your document with an XML serializer instead of XSL transformation, like Grzegorz already suggested.


@Grzegorz Szpetkowski has a good idea with using an external DTD. However, the XML is still invalid if you keep those station/@id values.

Any attribute with the type "ID" can't have a value that starts with a digit. You'll have to add something to it, like "s" for station:

<!DOCTYPE favoris [<!ELEMENT favoris (station*)      > <!ELEMENT station (#PCDATA)       > <!ATTLIST station           id       ID   #REQUIRED > ]><favoris>  <station id="s5">test1</station>  <station id="s6">test1</station>  <station id="s8">test1</station></favoris>