Reading Maven Pom xml in Python Reading Maven Pom xml in Python xml xml

Reading Maven Pom xml in Python


The main issues of the code in the question are

  • that it doesn't specify namespaces, and
  • that it uses */ instead of // which only matches direct children.

As you can see at the top of the XML file, Maven uses the namespace http://maven.apache.org/POM/4.0.0. The attribute xmlns in the root node defines the default namespace. The attribute xmlns:xsi defines a namespace that is only used for xsi:schemaLocation.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

To specify tags like profile in methods like find, you have to specify the namespace as well. For example, you could write the following to find all profile-tags.

import xml.etree as xmlpom = xml.parse('pom.xml')for profile in pom.findall('//{http://maven.apache.org/POM/4.0.0}profile'):    print(repr(profile))

Also note that I'm using //. Using */ would have the same result for your specific xml file above. However, it would not work for other tags like mappings. Since * represents only one level, */child can be expanded to parent/tag or xyz/tag but not to xyz/parent/tag.


Now, you should be able to come up with something like this to find all mappings:

pom = xml.parse('pom.xml')map = {}for mapping in pom.findall('//{http://maven.apache.org/POM/4.0.0}mappings'                           '/{http://maven.apache.org/POM/4.0.0}property'):    name  = mapping.find('{http://maven.apache.org/POM/4.0.0}name').text    value = mapping.find('{http://maven.apache.org/POM/4.0.0}value').text    map[name] = value

Specifying the namespaces like this is quite verbose. To make it easier to read, you can define a namespace map and pass it as second argument to find and findall:

# ...nsmap = {'m': 'http://maven.apache.org/POM/4.0.0'}for mapping in pom.findall('//m:mappings/m:property', nsmap):    name  = mapping.find('m:name', nsmap).text    value = mapping.find('m:value', nsmap).text    map[name] = value


Ok, found out that when I remove maven stuff from the project element so its just <project> I can do this:

for mapping in root.findall('*//mappings'):    logging.info(mapping)    for prop in mapping.findall('./property'):        logging.info(prop.find('name').text + " => " + prop.find('value').text)

Which would result in:

INFO:root:<Element 'mappings' at 0x10d72d350>INFO:root:homepage => /content/homepageINFO:root:assets => /content/assets

However, if I leave the Maven stuff in at the top I can do this:

for mapping in root.findall('*//{http://maven.apache.org/POM/4.0.0}mappings'):    logging.info(mapping)    for prop in mapping.findall('./{http://maven.apache.org/POM/4.0.0}property'):        logging.info(prop.find('{http://maven.apache.org/POM/4.0.0}name').text + " => " + prop.find('{http://maven.apache.org/POM/4.0.0}value').text)

Which results in:

INFO:root:<Element '{http://maven.apache.org/POM/4.0.0}mappings' at 0x10aa7f310>INFO:root:homepage => /content/homepageINFO:root:assets => /content/assets

However, I'd love to be able to figure out how to avoid having to account for the maven stuff since it locks me into this one format.

EDIT:

Ok, I managed to get something a bit more verbose:

import xml.etree.ElementTree as xmldef getMappingsNode(node, nodeName):    if node.findall('*'):        for n in node.findall('*'):            if nodeName in n.tag:                return n        else:            return getMappingsNode(n, nodeName)def getMappings(rootNode):    mappingsNode = getMappingsNode(rootNode, 'mappings')    mapping = {}    for prop in mappingsNode.findall('*'):        key = ''        val = ''        for child in prop.findall('*'):            if 'name' in child.tag:                key = child.text            if 'value' in child.tag:                val = child.text        if val and key:            mapping[key] = val    return mappingpomFile = xml.parse('pom.xml')root = pomFile.getroot()mappings = getMappings(root)print mappings