What is the best way to parse (big) XML in C# Code?

c# xml xml-serialization xmlreader

Use XmlReader to parse large XML documents. XmlReader provides fast, forward-only, non-cached access to XML data. (Forward-only means you can read the XML file from beginning to end but cannot move backwards in the file.) XmlReader uses small amounts of memory, and is equivalent to using a simple SAX reader.

    using (XmlReader myReader = XmlReader.Create(@"c:\data\coords.xml"))    {        while (myReader.Read())        {           // Process each node (myReader.Value) here           // ...        }    }

You can use XmlReader to process files that are up to 2 gigabytes (GB) in size.

Ref: How to read XML from a file by using Visual C#

c# xml xml-serialization xmlreader

Asat 14 May 2009: I've switched to using a hybrid approach... see code below.

This version has most of the advantages of both:
* the XmlReader/XmlTextReader (memory efficiency --> speed); and
* the XmlSerializer (code-gen --> development expediancy and flexibility).

It uses the XmlTextReader to iterate through the document, and creates "doclets" which it deserializes using the XmlSerializer and "XML binding" classes generated with XSD.EXE.

I guess this recipe is universally applicable, and it's fast... I'm parsing a 201 MB XML Document containing 56,000 GML Features in about 7 seconds... the old VB6 implementation of this application took minutes (or even hours) to parse large extracts... so I'm lookin' good to go.

Once again, a BIG Thank You to the forumites for donating your valuable time. I really appreciate it.

Cheers all. Keith.

using System;using System.Reflection;using System.Xml;using System.Xml.Serialization;using System.IO;using System.Collections.Generic;using nrw_rime_extract.utils;using nrw_rime_extract.xml.generated_bindings;namespace nrw_rime_extract.xml{    internal interface ExtractXmlReader    {        rimeType read(string xmlFilename);    }    /// <summary>    /// RimeExtractXml provides bindings to the RIME Extract XML as defined by    /// $/Release 2.7/Documentation/Technical/SCHEMA and DTDs/nrw-rime-extract.xsd    /// </summary>    internal class ExtractXmlReader_XmlSerializerImpl : ExtractXmlReader    {        private Log log = Log.getInstance();        public rimeType read(string xmlFilename)        {            log.write(                string.Format(                    "DEBUG: ExtractXmlReader_XmlSerializerImpl.read({0})",                    xmlFilename));            using (Stream stream = new FileStream(xmlFilename, FileMode.Open))            {                return read(stream);            }        }        internal rimeType read(Stream xmlInputStream)        {            // create an instance of the XmlSerializer class,             // specifying the type of object to be deserialized.            XmlSerializer serializer = new XmlSerializer(typeof(rimeType));            serializer.UnknownNode += new XmlNodeEventHandler(handleUnknownNode);            serializer.UnknownAttribute +=                 new XmlAttributeEventHandler(handleUnknownAttribute);            // use the Deserialize method to restore the object's state            // with data from the XML document.            return (rimeType)serializer.Deserialize(xmlInputStream);        }        protected void handleUnknownNode(object sender, XmlNodeEventArgs e)        {            log.write(                string.Format(                    "XML_ERROR: Unknown Node at line {0} position {1} : {2}\t{3}",                    e.LineNumber, e.LinePosition, e.Name, e.Text));        }        protected void handleUnknownAttribute(object sender, XmlAttributeEventArgs e)        {            log.write(                string.Format(                    "XML_ERROR: Unknown Attribute at line {0} position {1} : {2}='{3}'",                    e.LineNumber, e.LinePosition, e.Attr.Name, e.Attr.Value));        }    }    /// <summary>    /// xtractXmlReader provides bindings to the extract.xml     /// returned by the RIME server; as defined by:    ///   $/Release X/Documentation/Technical/SCHEMA and     /// DTDs/nrw-rime-extract.xsd    /// </summary>    internal class ExtractXmlReader_XmlTextReaderXmlSerializerHybridImpl :        ExtractXmlReader    {        private Log log = Log.getInstance();        public rimeType read(string xmlFilename)        {            log.write(                string.Format(                    "DEBUG: ExtractXmlReader_XmlTextReaderXmlSerializerHybridImpl." +                    "read({0})",                    xmlFilename));            using (XmlReader reader = XmlReader.Create(xmlFilename))            {                return read(reader);            }        }        public rimeType read(XmlReader reader)        {            rimeType result = new rimeType();            // a deserializer for featureClass, feature, etc, "doclets"            Dictionary<Type, XmlSerializer> serializers =                 new Dictionary<Type, XmlSerializer>();            serializers.Add(typeof(featureClassType),                 newSerializer(typeof(featureClassType)));            serializers.Add(typeof(featureType),                 newSerializer(typeof(featureType)));            List<featureClassType> featureClasses = new List<featureClassType>();            List<featureType> features = new List<featureType>();            while (!reader.EOF)            {                if (reader.MoveToContent() != XmlNodeType.Element)                {                    reader.Read(); // skip non-element-nodes and unknown-elements.                    continue;                }                // skip junk nodes.                if (reader.Name.Equals("featureClass"))                {                    using (                        StringReader elementReader =                            new StringReader(reader.ReadOuterXml()))                    {                        XmlSerializer deserializer =                            serializers[typeof (featureClassType)];                        featureClasses.Add(                            (featureClassType)                            deserializer.Deserialize(elementReader));                    }                    continue;                    // ReadOuterXml advances the reader, so don't read again.                }                if (reader.Name.Equals("feature"))                {                    using (                        StringReader elementReader =                            new StringReader(reader.ReadOuterXml()))                    {                        XmlSerializer deserializer =                            serializers[typeof (featureType)];                        features.Add(                            (featureType)                            deserializer.Deserialize(elementReader));                    }                    continue;                    // ReadOuterXml advances the reader, so don't read again.                }                log.write(                    "WARNING: unknown element '" + reader.Name +                    "' was skipped during parsing.");                reader.Read(); // skip non-element-nodes and unknown-elements.            }            result.featureClasses = featureClasses.ToArray();            result.features = features.ToArray();            return result;        }        private XmlSerializer newSerializer(Type elementType)        {            XmlSerializer serializer = new XmlSerializer(elementType);            serializer.UnknownNode += new XmlNodeEventHandler(handleUnknownNode);            serializer.UnknownAttribute +=                 new XmlAttributeEventHandler(handleUnknownAttribute);            return serializer;        }        protected void handleUnknownNode(object sender, XmlNodeEventArgs e)        {            log.write(                string.Format(                    "XML_ERROR: Unknown Node at line {0} position {1} : {2}\t{3}",                    e.LineNumber, e.LinePosition, e.Name, e.Text));        }        protected void handleUnknownAttribute(object sender, XmlAttributeEventArgs e)        {            log.write(                string.Format(                    "XML_ERROR: Unknown Attribute at line {0} position {1} : {2}='{3}'",                    e.LineNumber, e.LinePosition, e.Attr.Name, e.Attr.Value));        }    }}

c# xml xml-serialization xmlreader

Just to summarise, and make the answer a bit more obvious for anyone who finds this thread in google.

Prior to .NET 2 the XmlTextReader was the most memory efficient XML parser available in the standard API (thanx Mitch;-)

.NET 2 introduced the XmlReader class which is better again It's a forward-only element iterator (a bit like a StAX parser). (thanx Cerebrus;-)

And remember kiddies, of any XML instance has the potential to be bigger than about 500k, DON'T USE DOM!

Cheers all. Keith.

CodeHunter

What is the best way to parse (big) XML in C# Code?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last