How do I use xml namespaces with find/findall in lxml?

python xml lxml xml-namespaces elementtree

If root.nsmap contains the table namespace prefix then you could:

root.xpath('.//table:table', namespaces=root.nsmap)

findall(path) accepts {namespace}name syntax instead of namespace:name. Therefore path should be preprocessed using namespace dictionary to the {namespace}name form before passing it to findall().

python xml lxml xml-namespaces elementtree

Maybe the first thing to notice is that the namespacesare defined at Element level, not Document level.

Most often though, all namespaces are declared in the document'sroot element (office:document-content here), which saves us parsing it all to collect inner xmlns scopes.

Then an element nsmap includes :

a default namespace, with None prefix (not always)
all ancestors namespaces, unless overridden.

If, as ChrisR mentionned, the default namespace is not supported,you can use a dict comprehension to filter it outin a more compact expression.

You have a slightly different syntax for xpath andElementPath.

So here's the code you could use to get all your first table's rows(tested with: lxml=3.4.2) :

import zipfilefrom lxml import etree# Open and parse the documentzf = zipfile.ZipFile('spreadsheet.ods')tree = etree.parse(zf.open('content.xml'))# Get the root elementroot = tree.getroot()# get its namespace map, excluding default namespacensmap = {k:v for k,v in root.nsmap.iteritems() if k}# use defined prefixes to access elementstable = tree.find('.//table:table', nsmap)rows = table.findall('table:table-row', nsmap)# or, if xpath is needed:table = tree.xpath('//table:table', namespaces=nsmap)[0]rows = table.xpath('table:table-row', namespaces=nsmap)

python xml lxml xml-namespaces elementtree

Here's a way to get all the namespaces in the XML document (and supposing there's no prefix conflict).

I use this when parsing XML documents where I do know in advance what the namespace URLs are, and only the prefix.

        doc = etree.XML(XML_string)        # Getting all the name spaces.        nsmap = {}        for ns in doc.xpath('//namespace::*'):            if ns[0]: # Removes the None namespace, neither needed nor supported.                nsmap[ns[0]] = ns[1]        doc.xpath('//prefix:element', namespaces=nsmap)

CodeHunter

How do I use xml namespaces with find/findall in lxml?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last