How to use xmltodict to get items out of an xml file
Using your example:
import xmltodictwith open('artikelen.xml') as fd: doc = xmltodict.parse(fd.read())
If you examine doc
, you'll see it's an OrderedDict
, ordered by tag:
>>> docOrderedDict([('artikelen', OrderedDict([('artikel', [OrderedDict([('@nummer', '121'), ('code', 'ABC123'), ('naam', 'Highlight pen'), ('voorraad', '231'), ('prijs', '0.56')]), OrderedDict([('@nummer', '123'), ('code', 'PQR678'), ('naam', 'Nietmachine'), ('voorraad', '587'), ('prijs', '9.99')])])]))])
The root node is called artikelen
, and there a subnode artikel
which is a list of OrderedDict
objects, so if you want the code
for every article, you would do:
codes = []for artikel in doc['artikelen']['artikel']: codes.append(artikel['code'])# >>> codes# ['ABC123', 'PQR678']
If you specifically want the code
only when nummer
is 121
, you could do this:
code = Nonefor artikel in doc['artikelen']['artikel']: if artikel['@nummer'] == '121': code = artikel['code'] break
That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree
.
This is using xml.etreeYou can try this:
for artikelobj in root.findall('artikel'): print artikelobj.find('code')
if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:
for artikelobj in root.findall('artikel'): if artikel.get('nummer') == 121: print artikelobj.find('code')
this will print only the code you want.
You can use lxml package using XPath Expression.
from lxml import etreef = open("8_1.html", "r")tree = etree.parse(f)expression = "/artikelen/artikel[1]/code"l = tree.xpath(expression)code = next(i.text for i in l)print code# ABC123
The thing to notice here is the expression. /artikelen
is the root element. /artikel[1]
chooses the first artikel
element under root
(Notice first element is not at index 0). /code
is the child element under artikel[1]
. You can read more about at lxml and xpath syntax.