What is an XML infoset and in what ways is it different to an XML document? What is an XML infoset and in what ways is it different to an XML document? xml xml

What is an XML infoset and in what ways is it different to an XML document?


XML is not text. XML "is" the XML infoset. This may then be serialized into text in an XML document, but it is the XML infoset that is the reality.

The infoset may exist in memory as a DOM tree, for instance. It exists in memory as the implementation of an abstract object model.

What if I serialized it as UTF-8 and then as UTF-16. Chances are the results would be two different sets of bits, but same infoset.

Consider also that with text it makes sense to do things like string concatenation. You don't want to concatenate a "<" into the middle of an XML element. You have to encode it first. Why would you have to do this if it were just text? If you used the DOM, for instance, you'd just say element.InnerText = "<"; When serialized, the "<" would be encoded into "<". Yet it's the same infoset.


A useful way of thinking of the distinction between XML text and the XML infoset is to consider the Fast Infoset. This is a binary representation of the XML infoset.

So you have the an abstract "infoset" which is a conceptual model representing XML data (nodes, elements, attributes, etc). This can be physically represented as a text XML document, or as a Fast Infoset stream. Both represent the same data, but in radically different ways.


A valid XML document fulfills the requirements of a DTD or XSD (or other standards). If it is well-formed, it still can be 'invalid', if it violates the rules in the given DTD or XSD.

Edit: I am new to this area of XML, but it looks like the infoset is the 'abstract level' description of the parts of a XML document, independent of the actual technical implementation - which could be, for example, a Document Object Model implementation.