Reading Maven Pom xml in Python Reading Maven Pom xml in Python xml xml

Reading Maven Pom xml in Python

The main issues of the code in the question are

  • that it doesn't specify namespaces, and
  • that it uses */ instead of // which only matches direct children.

As you can see at the top of the XML file, Maven uses the namespace The attribute xmlns in the root node defines the default namespace. The attribute xmlns:xsi defines a namespace that is only used for xsi:schemaLocation.

<project xmlns="" xmlns:xsi=""         xsi:schemaLocation="">

To specify tags like profile in methods like find, you have to specify the namespace as well. For example, you could write the following to find all profile-tags.

import xml.etree as xmlpom = xml.parse('pom.xml')for profile in pom.findall('//{}profile'):    print(repr(profile))

Also note that I'm using //. Using */ would have the same result for your specific xml file above. However, it would not work for other tags like mappings. Since * represents only one level, */child can be expanded to parent/tag or xyz/tag but not to xyz/parent/tag.

Now, you should be able to come up with something like this to find all mappings:

pom = xml.parse('pom.xml')map = {}for mapping in pom.findall('//{}mappings'                           '/{}property'):    name  = mapping.find('{}name').text    value = mapping.find('{}value').text    map[name] = value

Specifying the namespaces like this is quite verbose. To make it easier to read, you can define a namespace map and pass it as second argument to find and findall:

# ...nsmap = {'m': ''}for mapping in pom.findall('//m:mappings/m:property', nsmap):    name  = mapping.find('m:name', nsmap).text    value = mapping.find('m:value', nsmap).text    map[name] = value

Ok, found out that when I remove maven stuff from the project element so its just <project> I can do this:

for mapping in root.findall('*//mappings'):    for prop in mapping.findall('./property'):'name').text + " => " + prop.find('value').text)

Which would result in:

INFO:root:<Element 'mappings' at 0x10d72d350>INFO:root:homepage => /content/homepageINFO:root:assets => /content/assets

However, if I leave the Maven stuff in at the top I can do this:

for mapping in root.findall('*//{}mappings'):    for prop in mapping.findall('./{}property'):'{}name').text + " => " + prop.find('{}value').text)

Which results in:

INFO:root:<Element '{}mappings' at 0x10aa7f310>INFO:root:homepage => /content/homepageINFO:root:assets => /content/assets

However, I'd love to be able to figure out how to avoid having to account for the maven stuff since it locks me into this one format.


Ok, I managed to get something a bit more verbose:

import xml.etree.ElementTree as xmldef getMappingsNode(node, nodeName):    if node.findall('*'):        for n in node.findall('*'):            if nodeName in n.tag:                return n        else:            return getMappingsNode(n, nodeName)def getMappings(rootNode):    mappingsNode = getMappingsNode(rootNode, 'mappings')    mapping = {}    for prop in mappingsNode.findall('*'):        key = ''        val = ''        for child in prop.findall('*'):            if 'name' in child.tag:                key = child.text            if 'value' in child.tag:                val = child.text        if val and key:            mapping[key] = val    return mappingpomFile = xml.parse('pom.xml')root = pomFile.getroot()mappings = getMappings(root)print mappings