XML Parsing with Python and minidom
getElementsByTagName is recursive, you'll get all descendents with a matching tagName. Because your Topics contain other Topics that also have Titles, the call will get the lower-down Titles many times.
If you want to ask for all matching direct children only, and you don't have XPath available, you can write a simple filter, eg.:
def getChildrenByTagName(node, tagName): for child in node.childNodes: if child.nodeType==child.ELEMENT_NODE and (tagName=='*' or child.tagName==tagName): yield childfor topic in document.getElementsByTagName('Topic'): title= list(getChildrenByTagName('Title'))[0] # or just get(...).next() print title.firstChild.data
The following works:
import xml.dom.minidomfrom xml.dom.minidom import Nodedom = xml.dom.minidom.parse("docmap.xml")def getChildrenByTitle(node): for child in node.childNodes: if child.localName=='Title': yield childTopic=dom.getElementsByTagName('Topic')for node in Topic: alist=getChildrenByTitle(node) for a in alist: Title= a.childNodes[0].nodeValue print Title
I think that can help
import osimport sysimport subprocessimport base64,xml.dom.minidomfrom xml.dom.minidom import Nodef = open("file.xml",'r')data = f.read()i = 0doc = xml.dom.minidom.parseString(data)for topic in doc.getElementsByTagName('Topic'): title= doc.getElementsByTagName('Title')[i].firstChild.nodeValue print title i +=1
Output:
My DocumentOverviewBasic FeaturesAbout This SoftwarePlatforms Supported