Truncate Markdown?
- Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b><a href=\"hi\">Something</a></p>".truncate_html(5, at_end = "...")=> <p><b><a href="hi">Someth...</a></b></p>
The modified code:
require 'rexml/parsers/pullparser'class String def truncate_html(len = 30, at_end = nil) p = REXML::Parsers::PullParser.new(self) tags = [] new_len = len results = '' while p.has_next? && new_len > 0 p_e = p.pull case p_e.event_type when :start_element tags.push p_e[0] results << "<#{tags.last}#{attrs_to_s(p_e[1])}>" when :end_element results << "</#{tags.pop}>" when :text results << p_e[0][0..new_len] new_len -= p_e[0].length else results << "<!-- #{p_e.inspect} -->" end end if at_end results << "..." end tags.reverse.each do |tag| results << "</#{tag}>" end results end private def attrs_to_s(attrs) if attrs.empty? '' else ' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ') end endend
Here's a solution that works for me with Textile.
- Convert it to HTML
- Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eosThis article is an example of something or other.This segment will be used as the snippet on the index page.^^^^^^^^^^^^^^^This text will be visible once clicking the "Read more.." linkeospreview = markdown_string[0...(markdown_string =~ /^\^+$/)]puts preview