Unable to unmarshall \u0000 after successfully marshalling it [closed] Unable to unmarshall \u0000 after successfully marshalling it [closed] xml xml

Unable to unmarshall \u0000 after successfully marshalling it [closed]


why JAXB/DOM API allows creating invalid XML documents which it can not read back? Shouldn't it fail fast during marshalling?

  1. You would need to ask the implementors.

  2. It is possibly that they thought that the expense of checking every data character serialised was not justified ... especially if the parser is then going to check them all over again.

  3. Having decided to implement the serializer this way (or having just done so by mistake), if they then changed the behaviour to do strict checking by default, they would break existing code that depends on being able to serialise illegal XML.

But shouldn't mature XML stack in Java (I'm using 1.7.0_05) handle this either by default or by having some simple setting?

Not necessarily ... if you accept the reason #2 above. Even a simple settings could have a measurable impact on performance.


Also 0 (neither binary nor escaped) is not allowed by any XML parser or xmllint ...

Quite rightly so! It is forbidden by the XML spec.

However, a more interesting test would be to see what happens when you try to generate XML containing an illegal character using other XML stacks.


is there some elegant and global solution?

If the problem you are trying to solve is how to send a \u0000 or \u000B, then you need to apply some application-specific encoding to the String before you insert it into the DOM. And the other end needs to deploy the equivalent decoding.

If the problem you are trying to solve is how to detect the bad data before it is too late, you could do this with an output stream filter between the serializer and the final output stream. But if you detect the badness, there is no good (i.e. transparent to the XML consumer) way to fix it.