Truncate Markdown? Truncate Markdown? ruby ruby

Truncate Markdown?


  • Write/find an intelligent HTML truncating function

The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.

>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")=> <p><b><a href="hi">Someth...</a></b></p>

The modified code:

require 'rexml/parsers/pullparser'class String  def truncate_html(len = 30, at_end = nil)    p = REXML::Parsers::PullParser.new(self)    tags = []    new_len = len    results = ''    while p.has_next? && new_len > 0      p_e = p.pull      case p_e.event_type      when :start_element        tags.push p_e[0]        results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"      when :end_element        results << "</#{tags.pop}>"      when :text        results << p_e[0][0..new_len]        new_len -= p_e[0].length      else        results << "<!-- #{p_e.inspect} -->"      end    end    if at_end      results << "..."    end    tags.reverse.each do |tag|      results << "</#{tag}>"    end    results  end  private  def attrs_to_s(attrs)    if attrs.empty?      ''    else      ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')    end  endend


Here's a solution that works for me with Textile.

  1. Convert it to HTML
  2. Truncate it.
  3. Remove any HTML tags that got cut in half with

    html_string.gsub(/<[^>]*$/, "")
  4. Then, uses Hpricot to clean it up and close unclosed tags

    html_string = Hpricot( html_string ).to_s 

I do this in a helper, and with caching there's no performance issue.


You could use a regular expression to find a line consisting of nothing but "^" characters:

markdown_string = <<-eosThis article is an example of something or other.This segment will be used as the snippet on the index page.^^^^^^^^^^^^^^^This text will be visible once clicking the "Read more.." linkeospreview = markdown_string[0...(markdown_string =~ /^\^+$/)]puts preview