Remove namespace and prefix from xml in python using lxml

We can get the desired output document in two steps:

Remove namespace URIs from element names
Remove unused namespace declarations from the XML tree

Example code

from lxml import etreeinput_xml = """<package xmlns="http://apple.com/itunes/importer">  <provider>some data</provider>  <language>en-GB</language>  <!-- some comment -->  <?xml-some-processing-instruction ?></package>"""root = etree.fromstring(input_xml)# Iterate through all XML elementsfor elem in root.getiterator():    # Skip comments and processing instructions,    # because they do not have names    if not (        isinstance(elem, etree._Comment)        or isinstance(elem, etree._ProcessingInstruction)    ):        # Remove a namespace URI in the element's name        elem.tag = etree.QName(elem).localname# Remove unused namespace declarationsetree.cleanup_namespaces(root)print(etree.tostring(root).decode())

Output XML

<package>  <provider>some data</provider>  <language>en-GB</language>  <!-- some comment -->  <?xml-some-processing-instruction ?></package>

Details explaining the code

As described in the documentation, we use lxml.etree.QName.localname to get local names of elements, that is names without namespace URIs. Then we replace the fully qualified names of the elements by their local names.

Some XML elements, such as comments and processing instructions do not have names. So, we have to skip these elements while replacing element names, otherwise a ValueError will be raised.

Finally, we use lxml.etree.cleanup_namespaces() to remove unused namespace declarations from the XML tree.

python xml namespaces lxml

Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.

from lxml import etree, objectifymetadata = '/Users/user1/Desktop/Python/metadata.xml'parser = etree.XMLParser(remove_blank_text=True)tree = etree.parse(metadata, parser)root = tree.getroot()####    for elem in root.getiterator():    if not hasattr(elem.tag, 'find'): continue  # (1)    i = elem.tag.find('}')    if i >= 0:        elem.tag = elem.tag[i+1:]objectify.deannotate(root, cleanup_namespaces=True)####tree.write('/Users/user1/Desktop/Python/done.xml',           pretty_print=True, xml_declaration=True, encoding='UTF-8')

UPDATE

Some tags like Comment return a function when accessing tag attribute. added a guard for that. (1)

python xml namespaces lxml

import xml.etree.ElementTree as ETdef remove_namespace(doc, namespace):    """Remove namespace in the passed document in place."""    ns = u'{%s}' % namespace    nsl = len(ns)    for elem in doc.getiterator():        if elem.tag.startswith(ns):            elem.tag = elem.tag[nsl:]metadata = '/Users/user1/Desktop/Python/metadata.xml'tree = ET.parse(metadata)root = tree.getroot()remove_namespace(root, u'http://apple.com/itunes/importer')tree.write('/Users/user1/Desktop/Python/done.xml',       pretty_print=True, xml_declaration=True, encoding='UTF-8')

Used a snippet of code from hereThis method could be easily extended to delete any namespace attributes by searching for tags that begin with "xmlns"

CodeHunter

Remove namespace and prefix from xml in python using lxml

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last