grabbing text between all tags in Nokogiri? grabbing text between all tags in Nokogiri? ruby ruby

grabbing text between all tags in Nokogiri?


doc = Nokogiri::HTML(your_html)doc.xpath("//text()").to_s


Use a Sax parser. Much faster than the XPath option.

require "nokogiri"some_html = <<-HTML<html>  <head>    <title>Title!</title>  </head>  <body>    This is the body!  </body></html>HTMLclass TextHandler < Nokogiri::XML::SAX::Document  def initialize    @chunks = []  end  attr_reader :chunks  def cdata_block(string)    characters(string)  end  def characters(string)    @chunks << string.strip if string.strip != ""  endendth = TextHandler.newparser = Nokogiri::HTML::SAX::Parser.new(th)parser.parse(some_html)puts th.chunks.inspect


Just do:

doc = Nokogiri::HTML(your_html)doc.xpath("//text()").text