How to use xmltodict to get items out of an xml file How to use xmltodict to get items out of an xml file xml xml

How to use xmltodict to get items out of an xml file


Using your example:

import xmltodictwith open('artikelen.xml') as fd:    doc = xmltodict.parse(fd.read())

If you examine doc, you'll see it's an OrderedDict, ordered by tag:

>>> docOrderedDict([('artikelen',              OrderedDict([('artikel',                            [OrderedDict([('@nummer', '121'),                                          ('code', 'ABC123'),                                          ('naam', 'Highlight pen'),                                          ('voorraad', '231'),                                          ('prijs', '0.56')]),                             OrderedDict([('@nummer', '123'),                                          ('code', 'PQR678'),                                          ('naam', 'Nietmachine'),                                          ('voorraad', '587'),                                          ('prijs', '9.99')])])]))])

The root node is called artikelen, and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:

codes = []for artikel in doc['artikelen']['artikel']:    codes.append(artikel['code'])# >>> codes# ['ABC123', 'PQR678']

If you specifically want the code only when nummer is 121, you could do this:

code = Nonefor artikel in doc['artikelen']['artikel']:    if artikel['@nummer'] == '121':        code = artikel['code']        break

That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.


This is using xml.etreeYou can try this:

for artikelobj in root.findall('artikel'):    print artikelobj.find('code')

if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:

for artikelobj in root.findall('artikel'):    if artikel.get('nummer') == 121:        print artikelobj.find('code')

this will print only the code you want.


You can use lxml package using XPath Expression.

from lxml import etreef = open("8_1.html", "r")tree = etree.parse(f)expression = "/artikelen/artikel[1]/code"l = tree.xpath(expression)code = next(i.text for i in l)print code# ABC123

The thing to notice here is the expression. /artikelen is the root element. /artikel[1] chooses the first artikel element under root(Notice first element is not at index 0). /code is the child element under artikel[1]. You can read more about at lxml and xpath syntax.