How to get all the info in XML into dictionary with Python How to get all the info in XML into dictionary with Python xml xml

How to get all the info in XML into dictionary with Python


You can use untangle library in python. untangle.parse() converts an XML document into a Python object

This takes an xml file as input and returns a python object which represents that xml document.

Lets take following xml file as an example and name it as test_xml.xml

<A> <B>  <C>"blah1"</C>  <C>"blah2"</C> </B> <B>  <C>"blah3"</C>  <C>"blah4"</C> </B></A>  

Now lets convert the above xml file into a python object to access the elements of xml file

>>>import untangle>>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file>>>obj = untangle.parse(input_file)>>>obj.A.B[0].C[0].cdatau'"blah1"'>>> obj.A.B[0].C[1].cdatau'"blah2"'>>> obj.A.B[1].C[0].cdatau'"blah3"'>>> obj.A.B[1].C[1].cdatau'"blah4"'


I usually use the lxml.objectify library for quick XML parsing.

With your XML string, you can do:

from lxml import objectifyroot = objectify.fromstring(xml_string)

And then get individual elements using a dictionary interface:

value = root["A"][0]["B"][0]["C"][0]

Or, if you prefer:

value = root.A[0].B[0].C[0]


I usually parse XML using the ElementTree module on the standard library. It does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.

from xml.etree import ElementTree as ETxml = ET.parse("<path-to-xml-file")root_element = xml.getroot()for child in root_element:   ...

If there is specific need to parse it to a dictionary, instead of getting the information you need from a DOM tree, a recursive function to build one from the root node would be something like:

def xml_dict(node, path="", dic =None):    if dic == None:        dic = {}    name_prefix = path + ("." if path else "") + node.tag    numbers = set()    for similar_name in dic.keys():        if similar_name.startswith(name_prefix):            numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) )    if not numbers:        numbers.add(0)    index = max(numbers) + 1    name = name_prefix + str(index)    dic[name] = node.text + "<...>".join(childnode.tail                                         if childnode.tail is not None else                                         "" for childnode in node)    for childnode in node:        xml_dict(childnode, name, dic)    return dic

For the XML you list above this yields this dictionary:

{'A1': '\n \n <...>\n', 'A1.B1': '\n  \n  <...>\n ', 'A1.B1.C1': '"blah"', 'A1.B1.C2': '"blah"', 'A1.B2': '\n  \n  <...>\n ', 'A1.B2.C1': '"blah"', 'A1.B2.C2': '"blah"'}

(I find the DOM form more useful)