Indenting generated markup in Jekyll/Ruby Indenting generated markup in Jekyll/Ruby ruby ruby

Indenting generated markup in Jekyll/Ruby


Using a Liquid Filter

I managed to make this work using a liquid filter. There are a few caveats:

  • Your input must be clean. I had some curly quotes and non-printable chars that looked like whitespace in a few files (copypasta from Word or some such) and was seeing "Invalid byte sequence in UTF-8" as a Jekyll error.

  • It could break some things. I was using <i class="icon-file"></i> icons from twitter bootstrap. It replaced the empty tag with <i class="icon-file"/> and bootstrap did not like that. Additionally, it screws up the octopress {% codeblock %}s in my content. I didn't really look into why.

  • While this will clean the output of a liquid variable such as {{ content }} it does not actually solve the problem in the original post, which is to indent the html in context of the surrounding html. This will provide well formatted html, but as a fragment that will not be indented relative to tags above the fragment. If you want to format everything in context, use the Rake task instead of the filter.

-

require 'rubygems'require 'json'require 'nokogiri'require 'nokogiri-pretty'module Jekyll  module PrettyPrintFilter    def pretty_print(input)      #seeing some ASCII-8 come in      input = input.encode("UTF-8")      #Parsing with nokogiri first cleans up some things the XSLT can't handle      content = Nokogiri::HTML::DocumentFragment.parse input      parsed_content = content.to_html      #Unfortunately nokogiri-pretty can't use DocumentFragments...      html = Nokogiri::HTML parsed_content      pretty = html.human      #...so now we need to remove the stuff it added to make valid HTML      output = PrettyPrintFilter.strip_extra_html(pretty)      output    end    def PrettyPrintFilter.strip_extra_html(html)      #type declaration      html = html.sub('<?xml version="1.0" encoding="ISO-8859-1"?>','')      #second <html> tag      first = true      html = html.gsub('<html>') do |match|        if first == true          first = false          next        else          ''        end      end      #first </html> tag      html = html.sub('</html>','')      #second <head> tag      first = true      html = html.gsub('<head>') do |match|        if first == true          first = false          next        else          ''        end      end      #first </head> tag      html = html.sub('</head>','')      #second <body> tag      first = true      html = html.gsub('<body>') do |match|        if first == true          first = false          next        else          ''        end      end      #first </body> tag      html = html.sub('</body>','')      html    end  endendLiquid::Template.register_filter(Jekyll::PrettyPrintFilter)

Using a Rake task

I use a task in my rakefile to pretty print the output after the jekyll site has been generated.

require 'nokogiri'require 'nokogiri-pretty'desc "Pretty print HTML output from Jekyll"task :pretty_print do  #change public to _site or wherever your output goes  html_files = File.join("**", "public", "**", "*.html")  Dir.glob html_files do |html_file|    puts "Cleaning #{html_file}"    file = File.open(html_file)    contents = file.read    begin      #we're gonna parse it as XML so we can apply an XSLT      html = Nokogiri::XML(contents)      #the human() method is from nokogiri-pretty. Just an XSL transform on the XML.      pretty_html = html.human    rescue Exception => msg      puts "Failed to pretty print #{html_file}: #{msg}"    end    #Yep, we're overwriting the file. Potentially destructive.    file = File.new(html_file,"w")    file.write(pretty_html)    file.close  endend


We can accomplish this by writing a custom Liquid filter to tidy the html, and then doing {{content | tidy }} to include the html.

A quick search suggests that the ruby tidy gem may not be maintained but that nokogiri is the way to go. This will of course mean installing the nokogiri gem.

See advice on writing liquid filters, and Jekyll example filters.

An example might look something like this: in _plugins, add a script called tidy-html.rb containing:

require 'nokogiri'module TextFilter  def tidy(input)  desired = Nokogiri::HTML::DocumentFragment.parse(input).to_html  endendLiquid::Template.register_filter(TextFilter)

(Untested)