What is a good XML stream parser for Python? [closed]

python xml parsing stream

Here's good answer about xml.etree.ElementTree.iterparse practice on huge XML files. lxml has the method as well. The key to stream parsing with iterparse is manual clearing and removing already processed nodes, because otherwise you will end up running out of memory.

Another option is using xml.sax. The official manual is too formal to me, and lacks examples so it needs clarification along with the question. Default parser module, xml.sax.expatreader, implement incremental parsing interface xml.sax.xmlreader.IncrementalParser. That is to say xml.sax.make_parser() provides suitable stream parser.

For instance, given a XML stream like:

<?xml version="1.0" encoding="utf-8"?><root>  <entry><a>value 0</a><b foo='bar' /></entry>  <entry><a>value 1</a><b foo='baz' /></entry>  <entry><a>value 2</a><b foo='quz' /></entry>  ...</root>

Can be handled in the following way.

#!/usr/bin/env python# -*- coding: utf-8 -*-import xml.saxclass StreamHandler(xml.sax.handler.ContentHandler):  lastEntry = None  lastName  = None  def startElement(self, name, attrs):    self.lastName = name    if name == 'entry':      self.lastEntry = {}    elif name != 'root':      self.lastEntry[name] = {'attrs': attrs, 'content': ''}  def endElement(self, name):    if name == 'entry':      print({        'a' : self.lastEntry['a']['content'],        'b' : self.lastEntry['b']['attrs'].getValue('foo')      })      self.lastEntry = None    elif name == 'root':      raise StopIteration  def characters(self, content):    if self.lastEntry:      self.lastEntry[self.lastName]['content'] += contentif __name__ == '__main__':  # use default ``xml.sax.expatreader``  parser = xml.sax.make_parser()  parser.setContentHandler(StreamHandler())  # feed the parser with small chunks to simulate  with open('data.xml') as f:    while True:      buffer = f.read(16)      if buffer:        try:          parser.feed(buffer)        except StopIteration:          break  # if you can provide a file-like object it's as simple as  with open('data.xml') as f:    parser.parse(f)

python xml parsing stream

Are you looking for xml.sax? It's right in the standard library.

python xml parsing stream

Use xml.etree.cElementTree. It's much faster than xml.etree.ElementTree. Neither of them are broken. Your files are broken (see my answer to your other question).

CodeHunter

What is a good XML stream parser for Python? [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last