XML - Referencing Other XML Files XML - Referencing Other XML Files xml xml

XML - Referencing Other XML Files


Standards

XInclude is the only standard with any level of support.

  • Several XML editors, including Oxygen and xmlspy support it.
  • Several XML parsers, including Xerces, also support it, and there are .net ports too.
  • Several XML tools, such as Saxon support it, both for Java and .net.

There are some good examples of use in the Wikipedia article on XInclude.

XLink is a tangentially-related standard, not really for including documents, but more for citing portions within other documents. It's not well supported.

Alternatives

If you are worried about size, there are several ways to go:

  • Use a streaming XML processor, such as DataDirect XQuery (or to a lesser extent, Saxon 9.3 EE, which only keeps as much information in memory as necessary to solve the query.
  • Use an XML database, such as MarkLogic or eXist.
  • Use one XML file to list the names of other XML files, which some program written in XQuery or XSLT then reads using the doc() function and processes. (Unless your processor is streaming or has a way to dispose of documents it is finished with, as DDXQ or Saxon do, you will still run into the same size problem through.)


There are a couple of "standard" ways to do what you want, namely XLink and XInclude (depending on what you want to do), though you have to make sure that you have a processor that can pull in the external references. Most XML libraries don't come with this functionality already enabled.

Then you'd be able to do something like:

<group>  <personlink xlink:href="person.xml" xlink:show="embed" xmlns:xlink="http://www.w3.org/1999/xlink"/></group>

However, you probably don't really need this. If you need a subset of information from a large document, you can easily use XSLT or XQuery to trim out the parts that you need. You can use this approach, along with SAX parsing - which is event based and doesn't have the whole document in memory - to scale you application to handle fairly large documents.

Even while using DOM, I didn't start to see problems with large documents until they were in the tens of megabytes range.


Here is the XML specification for DTD, in which you can declare entity references.

A simple document like:

<!DOCTYPE test [    <!ENTITY ref SYSTEM "file:///C:/test.txt" >]><test>    &ref;</test>

And file:///C:/test.txt being:

<blah>FeeFiFoFum</blah>

will expand the original document to:

<test>    <blah>    Fee    Fi    Fo    Fum    </blah></test>

I do believe non-validating XML parsers are not required to expand out the references, so be cautious there.

Also, don't forget to put standalone="no" in the XMLDecl. (Not having the standalone attribute assumes it equals "no", but its still better to put it there...)