How do I use Nokogiri::XML::Reader to parse large XML files? How do I use Nokogiri::XML::Reader to parse large XML files? xml xml

How do I use Nokogiri::XML::Reader to parse large XML files?


Each element in the stream comes through as two events: one to open the element and one to close it. The opening event will have

node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT

and the closing event will have

node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT

The empty strings you're seeing are just the element closing events. Remember that with SAX parsing, you're basically walking through a tree so you need the second event to tell you when you're going back up and closing an element.

You probably want something more like this:

reader.each do |node|  if node.name == "PMID" && node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT    p << node.inner_xml  endend

Or perhaps:

reader.each do |node|  next if node.name      != 'PMID'  next if node.node_type != Nokogiri::XML::Reader::TYPE_ELEMENT  p << node.inner_xmlend

Or some other variation on that.