what actually is PCDATA and CDATA? what actually is PCDATA and CDATA? xml xml

what actually is PCDATA and CDATA?


From WIKI:

PCDATA

Simply speaking, PCDATA stands for Parsed Character Data. That means the characters are to be parsed by the XML, XHTML, or HTML parser. (< will be changed to <, <p> will be taken to mean a paragraph tag, etc). Compare that with CDATA, where the characters are not to be parsed by the XML, XHTML, or HTML parser.

CDATA

The term CDATA, meaning character data, is used for distinct, but related purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.


Both PCDATA and CDATA are parsed. They are both character data.

They both must include only valid characters. For example if your document encoding is UTF-8, the content of CDATA sections must still be valid UTF-8 characters. So random binary data will probably prevent the document from being well-formed. Also CDATA sections are still parsed, if only to find the end section tag. But other markup-like characters, like <, > and & are ignored and passed as-is by the parser.

OTOH in PCDATA literal < and & (and ' or " in attribute values) must be escaped, or they will be interpreted as markup. Entities will also be expanded.

So yes, CDATA sections are indeed parsed. I am not sure why you were told that PCDATA is not parsed though.


PCDATA - Parsed Character Data

CDATA - (Unparsed) Character Data

http://www.w3schools.com/XML/xml_cdata.asp