What are best practices for designing XML schemas? [closed] What are best practices for designing XML schemas? [closed] xml xml

What are best practices for designing XML schemas? [closed]


One general (but important!) recommendation is never to store multiple logical pieces of data in a single node (be it a text node or an attribute node). Otherwise, you end up needing your own parsing logic on top of the XML parsing logic you normally get for free from your framework.

So in your coordinate example,<coordinate x="0" y="1" />and<coordinate> <x>0</x> <y>1</y> </coordinate>are both reasonable to me.

But <coordinate> 0,1 </coordinate> isn’t very good, because it’s storing two logical pieces of data (the X-coordinate and the Y-coordinate) in a single XML node—forcing the consumer to parse the data outside of their XML parser. And while splitting a string by a comma is pretty simple, there are still some ambiguities like what happens if there's an extra comma at the end.


I agree w/ cdragon's advice below to avoid option #2. The choice between #1 & #3 is largely a matter of style. I like to use attributes for what I consider to be attributes of the entity, and elements for what I consider to be data. Sometimes, it's hard to classify. Nonetheless, neither are "wrong".

And while we're on the topic of schema design, I'll add my two cents regarding my preferred level of (maximum) reuse (of both elements and types), which can also facilitate external "logical" referencing of these entities in, say, a data dictionary stored in a database.

Note that while the "Garden of Eden" schema pattern offers the maximum reuse, it also involves the most work. At the bottom of this post, I've provided links to the other patterns covered in the blog series.

The Garden of Eden approach http://blogs.msdn.com/skaufman/archive/2005/05/10/416269.aspx

Uses a modular approach by defining all elements globally and like the Venetian Blind approach all type definitions are declared globally. Each element is globally defined as an immediate child of the node and its type attribute can be set to one of the named complex types.

<?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="TargetNamespace" xmlns:TN="TargetNamespace"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   elementFormDefault="qualified" attributeFormDefault="unqualified"/> <xs:element name="BookInformation" type="BookInformationType"/>   <xs:complexType name="BookInformationType"/>     <xs:sequence>       <xs:element ref="Title"/>       <xs:element ref="ISBN"/>       <xs:element ref="Publisher"/>       <xs:element ref="PeopleInvolved" maxOccurs="unbounded"/>     </xs:sequence>   </xs:complexType>   <xs:complexType name="PeopleInvolvedType">     <xs:sequence>       <xs:element name="Author"/>     </xs:sequence>   </xs:complexType>   <xs:element name="Title"/>   <xs:element name="ISBN"/>   <xs:element name="Publisher"/>   <xs:element name="PeopleInvolved" type="PeopleInvolvedType"/> </xs:schema>
The advantage of this approach is that the schemas are reusable. Since both the elements and types are defined globally both are available for reuse. This approach offers the maximum amount of reusable content.The disadvantages are the that the schema is verbose.This would be an appropriate design when you are creating general libraries in which you can afford to make no assumptions about the scope of the schema elements and types and their use in other schemas particularly in reference to extensibility and modularity.


Since every distinct type and element has a single global definition, these canonical particles/components can be related one-to-one to identifiers in a database. And while it may at first glance seem like a tiresome ongoing manual task to maintain the associations between the textual XSD particles/components and the database, SQL Server 2005 can in fact generate canonical schema component identifiers via the statement

CREATE XML SCHEMA COLLECTION

http://technet.microsoft.com/en-us/library/ms179457.aspx

Conversely, to construct a schema from the canonical particles, SQL Server 2005 provides the

SELECT xml_schema_namespace function

http://technet.microsoft.com/en-us/library/ms191170.aspx

ca·non·i·cal Related to Mathematics. (of an equation, coordinate, etc.) "in simplest or standard form" http://dictionary.reference.com/browse/canonical

Other, easier to construct, but less resuable/more "denormalized/redundant" schema patterns include

The Russian Doll approach http://blogs.msdn.com/skaufman/archive/2005/04/21/410486.aspx

The schema has one single global element - the root element. All other elements and types are nested progressively deeper giving it the name due to each type fitting into the one above it. Since the elements in this design are declared locally they will not be reusable through the import or include statements.

The the Salami Slice approach http://blogs.msdn.com/skaufman/archive/2005/04/25/411809.aspx

All elements are defined globally but the type definitions are defined locally. This way other schemas may reuse the elements. With this approach, a global element with its locally defined type provide a complete description of the elements content. This information 'slice' is declared individually and then aggregated back together and may also be pieced together to construct other schemas.

The Venetian Blind approach http://blogs.msdn.com/skaufman/archive/2005/04/29/413491.aspx

Similar to the Russian Doll approach in that they both use a single global element. The Venetian Blind approach describes a modular approach by naming and defining all type definitions globally (as opposed to the Salami Slice approach which declares elements globally and types locally). Each globally defined type describes an individual "slat" and can be reused by other components. In addition, all the locally declared elements can be namespace qualified or namespace unqualified (the slats can be "opened" or "closed") depending on the elementFormDefault attribute setting at the top of the schema.