Which language is easiest and fastest to work with XML content? Which language is easiest and fastest to work with XML content? xml xml

Which language is easiest and fastest to work with XML content?


A dynamic language rules for this. Why? The mappings are easy to code and change. You don't have to recompile and rebuild.

Indeed, with a little cleverness, you can have your "XML XPATH to a Tag -> DB table-field" mappings as disjoint blocks of Python code that your main application imports.

The block of Python code is your configuration file. It's not an .ini file or a .properties file that describes a configuration. It is the configuration.

We use Python, xml.etree and the SQLAlchemy (to separate the SQL out of your programs) for this because we're up and running with very little effort and a great deal of flexibility.


source.py

"""A particular XML parser.  Formats change, so sometimes this changes, too."""import xml.etree.ElementTree as xmlclass SSXML_Source( object ):    ns0= "urn:schemas-microsoft-com:office:spreadsheet"    ns1= "urn:schemas-microsoft-com:office:excel"    def __init__( self, aFileName, *sheets ):        """Initialize a XML source.        XXX - Create better sheet filtering here, in the constructor.        @param aFileName: the file name.        """        super( SSXML_Source, self ).__init__( aFileName )        self.log= logging.getLogger( "source.PCIX_XLS" )        self.dom= etree.parse( aFileName ).getroot()    def sheets( self ):        for wb in self.dom.getiterator("{%s}Workbook" % ( self.ns0, ) ):            for ws in wb.getiterator( "{%s}Worksheet" % ( self.ns0, ) ):                yield ws    def rows( self ):        for s in self.sheets():            print s.attrib["{%s}Name" % ( self.ns0, ) ]            for t in s.getiterator( "{%s}Table" % ( self.ns0, ) ):                for r in t.getiterator( "{%s}Row" % ( self.ns0, ) ):                    # The XML may not be really useful.                    # In some cases, you may have to convert to something useful                    yield r

model.py

"""This is your target object.  It's part of the problem domain; it rarely changes."""class MyTargetObject( object ):    def __init__( self ):        self.someAttr= ""        self.anotherAttr= ""        self.this= 0        self.that= 3.14159    def aMethod( self ):        """etc."""        pass

builder_today.py One of many mapping configurations

"""One of many builders.  This changes all the time to fitspecific needs and situations.  The goal is to keep thisshort and to-the-point so that it has the mapping and nothingbut the mapping."""import modelclass MyTargetBuilder( object ):    def makeFromXML( self, element ):        result= model.MyTargetObject()        result.someAttr= element.findtext( "Some" )        result.anotherAttr= element.findtext( "Another" )        result.this= int( element.findtext( "This" ) )        result.that= float( element.findtext( "that" ) )        return result

loader.py

"""An application that maps from XML to the domain objectusing a configurable "builder"."""import modelimport sourceimport builder_1import builder_2import builder_today# Configure this:  pick a builder is appropriate for the data:b= builder_today.MyTargetBuilder()s= source.SSXML_Source( sys.argv[1] )for r in s.rows():    data= b.makeFromXML( r )    # ... persist data with a DB save or file write

To make changes, you can correct a builder or create a new builder. You adjust the loader source to identify which builder will be used. You can, without too much trouble, make the selection of builder a command-line parameter. Dynamic imports in dynamic languages seem like overkill to me, but they are handy.


XSLT

I suggest using XSLT templates to transform the XML into INSERT statements (or whatever you need), as required.
You should be able to invoke XSLT from any of the languages you mention.

This will result in a lot less code than doing it the long way round.


In .NET, C# 3.0 and VB9 provide excellent support for working with XML using LINQ to XML:

LINQ to XML Overview