How to parse XML and count instances of a particular node attribute?
I suggest ElementTree
. There are other compatible implementations of the same API, such as lxml
, and cElementTree
in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree
defines.
First build an Element instance root
from the XML, e.g. with the XML function, or by parsing a file with something like:
import xml.etree.ElementTree as ETroot = ET.parse('thefile.xml').getroot()
Or any of the many other ways shown at ElementTree
. Then do something like:
for type_tag in root.findall('bar/type'): value = type_tag.get('foobar') print(value)
And similar, usually pretty simple, code patterns.
minidom
is the quickest and pretty straight forward.
XML:
<data> <items> <item name="item1"></item> <item name="item2"></item> <item name="item3"></item> <item name="item4"></item> </items></data>
Python:
from xml.dom import minidomxmldoc = minidom.parse('items.xml')itemlist = xmldoc.getElementsByTagName('item')print(len(itemlist))print(itemlist[0].attributes['name'].value)for s in itemlist: print(s.attributes['name'].value)
Output:
4item1item1item2item3item4
You can use BeautifulSoup:
from bs4 import BeautifulSoupx="""<foo> <bar> <type foobar="1"/> <type foobar="2"/> </bar></foo>"""y=BeautifulSoup(x)>>> y.foo.bar.type["foobar"]u'1'>>> y.foo.bar.findAll("type")[<type foobar="1"></type>, <type foobar="2"></type>]>>> y.foo.bar.findAll("type")[0]["foobar"]u'1'>>> y.foo.bar.findAll("type")[1]["foobar"]u'2'