What is the fastest XML parser in PHP? What is the fastest XML parser in PHP? xml xml

What is the fastest XML parser in PHP?


The fastest parser will be SAX -- it doesn't have to create a dom, and it can be done with partial xml, or progressively. Info on the PHP SAX parser (Expat) can be found here. Alternatively there is a libxml based DOM parser named SimpleXML. A DOM based parser will be easier to work with but it is typically a few orders of magnitude slower.


**This is geared primarily toward those starting with XML Parsing and not sure which parser to use.

There are two "big" ways to go about parsing - you can either load the XML into memory and find what you need (DOM, SimpleXML) or you can stream it - read it and execute code based on what you read (XMLReader, SAX).

According to Microsoft, SAX is a "push" parser, which sends every piece of information to your application and your application processes it. SimpleXML is a "pull" parser, which allows you to skip chunks of data and only grab what you need. According to Microsoft, this can both simplify and accelerate your application, and I would assume the .NET and PHP implementations are similar. I suppose your choice would depend on your needs - if you're pulling out just a few tags from a larger chunk and can use the $xml->next('Element') to skip significant chunks, you may find that XMLReader is faster than SAX.

Parsing "small" (<30kb, 700 lines) XML files repetitively, you might not expect there would be a huge time difference between the methods of parsing. I was surprised to find that there was. I ran a comparison of a small feed processed in SimpleXML and XMLReader. Hopefully this will help someone else to visualize how significant a difference this data is. For a real life comparison, this is parsing the response to two Amazon MWS Product Information request feeds.

Each Parse Time is the time required to take 2 XML strings and return about 120 variables containing values from each string. Each loop takes different data, but each of the tests was on the same data in the same order.

SimpleXML loads the document into memory. I used microtime to check both the time to complete the parse (extract the relevant values), as well as the time spent creating the element (when new SimpleXMLElement($xml) was called). I have rounded these to 4 decimal places.

Parse Time: 0.5866 secondsParse Time: 0.3045 seconds Parse Time: 0.1037 secondsParse Time: 0.0151 seconds Parse Time: 0.0282 seconds Parse Time: 0.0622 seconds Parse Time: 0.7756 secondsParse Time: 0.2439 seconds  Parse Time: 0.0806 seconds Parse Time: 0.0696 secondsParse Time: 0.0218 secondsParse Time: 0.0542 seconds__________________________            2.3500 seconds            0.1958 seconds averageTime Spent Making the Elements: 0.5232 seconds Time Spent Making the Elements: 0.2974 seconds Time Spent Making the Elements: 0.0980 seconds Time Spent Making the Elements: 0.0097 seconds Time Spent Making the Elements: 0.0231 seconds Time Spent Making the Elements: 0.0091 seconds Time Spent Making the Elements: 0.7190 seconds Time Spent Making the Elements: 0.2410 seconds Time Spent Making the Elements: 0.0765 seconds Time Spent Making the Elements: 0.0637 seconds Time Spent Making the Elements: 0.0081 seconds Time Spent Making the Elements: 0.0507 seconds ______________________________________________                                2.1195 seconds                                0.1766 seconds averageover 90% of the total time is spent loading elements into the DOM.Only 0.2305 seconds is spent locating the elements and returning them.

While the XMLReader, which is stream based, I was able to skip a significant chunk of one of the XML feeds since the data I wanted was near the top of each element. "Your Mileage May Vary."

Parse Time: 0.1059 seconds  Parse Time: 0.0169 seconds Parse Time: 0.0214 seconds Parse Time: 0.0665 seconds Parse Time: 0.0255 seconds Parse Time: 0.0241 seconds Parse Time: 0.0234 seconds Parse Time: 0.0225 seconds Parse Time: 0.0183 seconds Parse Time: 0.0202 seconds Parse Time: 0.0245 seconds Parse Time: 0.0205 seconds __________________________            0.3897 seconds            0.0325 seconds average

What is striking is that although locating elements is slightly faster in SimpleXML once it is all loaded, it is actually over 6 times faster to use XMLReader overall.

You can find some information on using XMLReader at How to use XMLReader in PHP?


Each XML extension has its own strengths and weaknesses. For example, I have a script that parses the XML data dump from Stack Overflow. The posts.xml file is 2.8GB! For this large XML file, I had to use XMLReader because it reads XML in a streaming mode, instead of trying to load and represent the whole XML document in memory at once, as the DOM extension does.

So you need to be more specific about describing how you are going to use the XML, in order to decide which PHP extension to use.

All of PHP's XML extensions provide some method to read XML data as a string.