What's the difference between PHP's DOM and SimpleXML extensions? What's the difference between PHP's DOM and SimpleXML extensions? php php

What's the difference between PHP's DOM and SimpleXML extensions?


In a nutshell:

SimpleXml

  • is for simple XML and/or simple UseCases
  • limited API to work with nodes (e.g. cannot program to an interface that much)
  • all nodes are of the same kind (element node is the same as attribute node)
  • nodes are magically accessible, e.g. $root->foo->bar['attribute']

DOM

  • is for any XML UseCase you might have
  • is an implementation of the W3C DOM API (found implemented in many languages)
  • differentiates between various Node Types (more control)
  • much more verbose due to explicit API (can code to an interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both of these are based on libxml and can be influenced to some extend by the libxml functions


Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.

For instance, when you have <foo bar="1"/> the object dump of /foo/@bar will be identical to that of /foo but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar') on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.

But that's just my 2c. Make up your own mind :)


On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.

Also see my answer to


I'm going to make the shortest answer possible so that beginners can take it away easily. I'm also slightly simplifying things for shortness' sake. Jump to the end of that answer for the overstated TL;DR version.


DOM and SimpleXML aren't actually two different parsers. The real parser is libxml2, which is used internally by DOM and SimpleXML. So DOM/SimpleXML are just two ways to use the same parser and they provide ways to convert one object to another.

SimpleXML is intended to be very simple so it has a small set of functions, and it is focused on reading and writing data. That is, you can easily read or write a XML file, you can update some values or remove some nodes (with some limitations!), and that's it. No fancy manipulation, and you don't have access to the less common node types. For instance, SimpleXML cannot create a CDATA section although it can read them.

DOM offers a full-fledged implementation of the DOM plus a couple of non-standard methods such as appendXML. If you're used to manipulate DOM in Javascript, you'll find exactly the same methods in PHP's DOM. There's basically no limitation in what you can do and it evens handles HTML. The flipside to this richness of features is that it is more complex and more verbose than SimpleXML.


Side-note

People often wonder/ask what extension they should use to handle their XML or HTML content. Actually the choice is easy because there isn't much of a choice to begin with:

  • if you need to deal with HTML, you don't really have a choice: you have to use DOM
  • if you have to do anything fancy such as moving nodes or appending some raw XML, again you pretty much have to use DOM
  • if all you need to do is read and/or write some basic XML (e.g. exchanging data with an XML service or reading a RSS feed) then you can use either. Or both.
  • if your XML document is so big that it doesn't fit in memory, you can't use either and you have to use XMLReader which is also based on libxml2, is even more annoying to use but still plays nice with others

TL;DR

  • SimpleXML is super easy to use but only good for 90% of use cases.
  • DOM is more complex, but can do everything.
  • XMLReader is super complicated, but uses very little memory. Very situational.


As others have pointed out, the DOM and SimpleXML extensions are not strictly "XML parsers", rather they are different interfaces to the structure generated by the underlying libxml2 parser.

The SimpleXML interface treats XML as a serialized data structure, in the same way you would treat a decoded JSON string. So it provides quick access to the contents of a document, with emphasis on accessing elements by name, and reading their attributes and text content (including automatically folding in entities and CDATA sections). It supports documents containing multiple namespaces (primarily using the children() and attributes() methods), and can search a document using an XPath expression. It also includes support for basic manipulation of the content - e.g. adding or overwriting elements or attributes with a new string.

The DOM interface, on the other hand, treats XML as a structured document, where the representation used is as important as the data represented. It therefore provides much more granular and explicit access to different types of "node", such as entities and CDATA sections, as well as some which are ignored by SimpleXML, such as comments and processing instructions. It also provides a much richer set of manipulation functions, allowing you to rearrange nodes and choose how to represent text content, for instance. The tradeoff is a fairly complex API, with a large number of classes and methods; since it implements a standard API (originally developed for manipulating HTML in JavaScript), there may be less of a "natural PHP" feel, but some programmers may be familiar with it from other contexts.

Both interfaces require the full document to be parsed into memory, and effectively wrap up pointers into that parsed representation; you can even switch between the two wrappers with simplexml_import_dom() and dom_import_simplexml(), for instance to add a "missing" feature to SimpleXML using a function from the DOM API. For larger documents, the "pull-based" XMLReader or the "event-based" XML Parser may be more appropriate.