How do I use xml namespaces with find/findall in lxml? How do I use xml namespaces with find/findall in lxml? xml xml

How do I use xml namespaces with find/findall in lxml?


If root.nsmap contains the table namespace prefix then you could:

root.xpath('.//table:table', namespaces=root.nsmap)

findall(path) accepts {namespace}name syntax instead of namespace:name. Therefore path should be preprocessed using namespace dictionary to the {namespace}name form before passing it to findall().


Maybe the first thing to notice is that the namespacesare defined at Element level, not Document level.

Most often though, all namespaces are declared in the document'sroot element (office:document-content here), which saves us parsing it all to collect inner xmlns scopes.

Then an element nsmap includes :

  • a default namespace, with None prefix (not always)
  • all ancestors namespaces, unless overridden.

If, as ChrisR mentionned, the default namespace is not supported,you can use a dict comprehension to filter it outin a more compact expression.

You have a slightly different syntax for xpath andElementPath.


So here's the code you could use to get all your first table's rows(tested with: lxml=3.4.2) :

import zipfilefrom lxml import etree# Open and parse the documentzf = zipfile.ZipFile('spreadsheet.ods')tree = etree.parse(zf.open('content.xml'))# Get the root elementroot = tree.getroot()# get its namespace map, excluding default namespacensmap = {k:v for k,v in root.nsmap.iteritems() if k}# use defined prefixes to access elementstable = tree.find('.//table:table', nsmap)rows = table.findall('table:table-row', nsmap)# or, if xpath is needed:table = tree.xpath('//table:table', namespaces=nsmap)[0]rows = table.xpath('table:table-row', namespaces=nsmap)


Here's a way to get all the namespaces in the XML document (and supposing there's no prefix conflict).

I use this when parsing XML documents where I do know in advance what the namespace URLs are, and only the prefix.

        doc = etree.XML(XML_string)        # Getting all the name spaces.        nsmap = {}        for ns in doc.xpath('//namespace::*'):            if ns[0]: # Removes the None namespace, neither needed nor supported.                nsmap[ns[0]] = ns[1]        doc.xpath('//prefix:element', namespaces=nsmap)