Best way to process large XML in PHP [duplicate]
For a large file, you'll want to use a SAX parser rather than a DOM parser.
With a DOM parser it will read in the whole file and load it into an object tree in memory. With a SAX parser, it will read the file sequentially and call your user-defined callback functions to handle the data (start tags, end tags, CDATA, etc.)
With a SAX parser you'll need to maintain state yourself (e.g. what tag you are currently in) which makes it a bit more complicated, but for a large file it will be much more efficient memory wise.
My take on it:
https://github.com/prewk/XmlStreamer
A simple class that will extract all children to the XML root element while streaming the file.Tested on 108 MB XML file from pubmed.com.
class SimpleXmlStreamer extends XmlStreamer { public function processNode($xmlString, $elementName, $nodeIndex) { $xml = simplexml_load_string($xmlString); // Do something with your SimpleXML object return true; }}$streamer = new SimpleXmlStreamer("myLargeXmlFile.xml");$streamer->parse();
When using a DOMDocument
with large XML files, don't forget to pass the LIBXML_PARSEHUGE
flag in the options of the load()
method. (Same applies for the other load
methods of the DOMDocument
object)
$checkDom = new \DOMDocument('1.0', 'UTF-8'); $checkDom->load($filePath, LIBXML_PARSEHUGE);
(Works with a 120mo XML file)