Reading Maven Pom xml in Python
The main issues of the code in the question are
- that it doesn't specify namespaces, and
- that it uses
*/
instead of//
which only matches direct children.
As you can see at the top of the XML file, Maven uses the namespace http://maven.apache.org/POM/4.0.0
. The attribute xmlns
in the root node defines the default namespace. The attribute xmlns:xsi
defines a namespace that is only used for xsi:schemaLocation
.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
To specify tags like profile
in methods like find
, you have to specify the namespace as well. For example, you could write the following to find all profile
-tags.
import xml.etree as xmlpom = xml.parse('pom.xml')for profile in pom.findall('//{http://maven.apache.org/POM/4.0.0}profile'): print(repr(profile))
Also note that I'm using //
. Using */
would have the same result for your specific xml file above. However, it would not work for other tags like mappings
. Since *
represents only one level, */child
can be expanded to parent/tag
or xyz/tag
but not to xyz/parent/tag
.
Now, you should be able to come up with something like this to find all mappings:
pom = xml.parse('pom.xml')map = {}for mapping in pom.findall('//{http://maven.apache.org/POM/4.0.0}mappings' '/{http://maven.apache.org/POM/4.0.0}property'): name = mapping.find('{http://maven.apache.org/POM/4.0.0}name').text value = mapping.find('{http://maven.apache.org/POM/4.0.0}value').text map[name] = value
Specifying the namespaces like this is quite verbose. To make it easier to read, you can define a namespace map and pass it as second argument to find
and findall
:
# ...nsmap = {'m': 'http://maven.apache.org/POM/4.0.0'}for mapping in pom.findall('//m:mappings/m:property', nsmap): name = mapping.find('m:name', nsmap).text value = mapping.find('m:value', nsmap).text map[name] = value
Ok, found out that when I remove maven stuff from the project
element so its just <project>
I can do this:
for mapping in root.findall('*//mappings'): logging.info(mapping) for prop in mapping.findall('./property'): logging.info(prop.find('name').text + " => " + prop.find('value').text)
Which would result in:
INFO:root:<Element 'mappings' at 0x10d72d350>INFO:root:homepage => /content/homepageINFO:root:assets => /content/assets
However, if I leave the Maven stuff in at the top I can do this:
for mapping in root.findall('*//{http://maven.apache.org/POM/4.0.0}mappings'): logging.info(mapping) for prop in mapping.findall('./{http://maven.apache.org/POM/4.0.0}property'): logging.info(prop.find('{http://maven.apache.org/POM/4.0.0}name').text + " => " + prop.find('{http://maven.apache.org/POM/4.0.0}value').text)
Which results in:
INFO:root:<Element '{http://maven.apache.org/POM/4.0.0}mappings' at 0x10aa7f310>INFO:root:homepage => /content/homepageINFO:root:assets => /content/assets
However, I'd love to be able to figure out how to avoid having to account for the maven stuff since it locks me into this one format.
EDIT:
Ok, I managed to get something a bit more verbose:
import xml.etree.ElementTree as xmldef getMappingsNode(node, nodeName): if node.findall('*'): for n in node.findall('*'): if nodeName in n.tag: return n else: return getMappingsNode(n, nodeName)def getMappings(rootNode): mappingsNode = getMappingsNode(rootNode, 'mappings') mapping = {} for prop in mappingsNode.findall('*'): key = '' val = '' for child in prop.findall('*'): if 'name' in child.tag: key = child.text if 'value' in child.tag: val = child.text if val and key: mapping[key] = val return mappingpomFile = xml.parse('pom.xml')root = pomFile.getroot()mappings = getMappings(root)print mappings