Which language is easiest and fastest to work with XML content?
A dynamic language rules for this. Why? The mappings are easy to code and change. You don't have to recompile and rebuild.
Indeed, with a little cleverness, you can have your "XML XPATH to a Tag -> DB table-field" mappings as disjoint blocks of Python code that your main application imports.
The block of Python code is your configuration file. It's not an .ini
file or a .properties
file that describes a configuration. It is the configuration.
We use Python, xml.etree and the SQLAlchemy (to separate the SQL out of your programs) for this because we're up and running with very little effort and a great deal of flexibility.
source.py
"""A particular XML parser. Formats change, so sometimes this changes, too."""import xml.etree.ElementTree as xmlclass SSXML_Source( object ): ns0= "urn:schemas-microsoft-com:office:spreadsheet" ns1= "urn:schemas-microsoft-com:office:excel" def __init__( self, aFileName, *sheets ): """Initialize a XML source. XXX - Create better sheet filtering here, in the constructor. @param aFileName: the file name. """ super( SSXML_Source, self ).__init__( aFileName ) self.log= logging.getLogger( "source.PCIX_XLS" ) self.dom= etree.parse( aFileName ).getroot() def sheets( self ): for wb in self.dom.getiterator("{%s}Workbook" % ( self.ns0, ) ): for ws in wb.getiterator( "{%s}Worksheet" % ( self.ns0, ) ): yield ws def rows( self ): for s in self.sheets(): print s.attrib["{%s}Name" % ( self.ns0, ) ] for t in s.getiterator( "{%s}Table" % ( self.ns0, ) ): for r in t.getiterator( "{%s}Row" % ( self.ns0, ) ): # The XML may not be really useful. # In some cases, you may have to convert to something useful yield r
model.py
"""This is your target object. It's part of the problem domain; it rarely changes."""class MyTargetObject( object ): def __init__( self ): self.someAttr= "" self.anotherAttr= "" self.this= 0 self.that= 3.14159 def aMethod( self ): """etc.""" pass
builder_today.py One of many mapping configurations
"""One of many builders. This changes all the time to fitspecific needs and situations. The goal is to keep thisshort and to-the-point so that it has the mapping and nothingbut the mapping."""import modelclass MyTargetBuilder( object ): def makeFromXML( self, element ): result= model.MyTargetObject() result.someAttr= element.findtext( "Some" ) result.anotherAttr= element.findtext( "Another" ) result.this= int( element.findtext( "This" ) ) result.that= float( element.findtext( "that" ) ) return result
loader.py
"""An application that maps from XML to the domain objectusing a configurable "builder"."""import modelimport sourceimport builder_1import builder_2import builder_today# Configure this: pick a builder is appropriate for the data:b= builder_today.MyTargetBuilder()s= source.SSXML_Source( sys.argv[1] )for r in s.rows(): data= b.makeFromXML( r ) # ... persist data with a DB save or file write
To make changes, you can correct a builder or create a new builder. You adjust the loader source to identify which builder will be used. You can, without too much trouble, make the selection of builder a command-line parameter. Dynamic imports in dynamic languages seem like overkill to me, but they are handy.