How to get all the info in XML into dictionary with Python
You can use untangle library in python. untangle.parse() converts an XML document into a Python object
This takes an xml file as input and returns a python object which represents that xml document.
Lets take following xml file as an example and name it as test_xml.xml
<A> <B> <C>"blah1"</C> <C>"blah2"</C> </B> <B> <C>"blah3"</C> <C>"blah4"</C> </B></A>
Now lets convert the above xml file into a python object to access the elements of xml file
>>>import untangle>>>input_file = "/home/tests/test_xml.xml" #Full path to your xml file>>>obj = untangle.parse(input_file)>>>obj.A.B[0].C[0].cdatau'"blah1"'>>> obj.A.B[0].C[1].cdatau'"blah2"'>>> obj.A.B[1].C[0].cdatau'"blah3"'>>> obj.A.B[1].C[1].cdatau'"blah4"'
I usually use the lxml.objectify library for quick XML parsing.
With your XML string, you can do:
from lxml import objectifyroot = objectify.fromstring(xml_string)
And then get individual elements using a dictionary interface:
value = root["A"][0]["B"][0]["C"][0]
Or, if you prefer:
value = root.A[0].B[0].C[0]
I usually parse XML using the ElementTree module on the standard library. It does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.
from xml.etree import ElementTree as ETxml = ET.parse("<path-to-xml-file")root_element = xml.getroot()for child in root_element: ...
If there is specific need to parse it to a dictionary, instead of getting the information you need from a DOM tree, a recursive function to build one from the root node would be something like:
def xml_dict(node, path="", dic =None): if dic == None: dic = {} name_prefix = path + ("." if path else "") + node.tag numbers = set() for similar_name in dic.keys(): if similar_name.startswith(name_prefix): numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) ) if not numbers: numbers.add(0) index = max(numbers) + 1 name = name_prefix + str(index) dic[name] = node.text + "<...>".join(childnode.tail if childnode.tail is not None else "" for childnode in node) for childnode in node: xml_dict(childnode, name, dic) return dic
For the XML you list above this yields this dictionary:
{'A1': '\n \n <...>\n', 'A1.B1': '\n \n <...>\n ', 'A1.B1.C1': '"blah"', 'A1.B1.C2': '"blah"', 'A1.B2': '\n \n <...>\n ', 'A1.B2.C1': '"blah"', 'A1.B2.C2': '"blah"'}
(I find the DOM form more useful)