Indenting generated markup in Jekyll/Ruby
Using a Liquid Filter
I managed to make this work using a liquid filter. There are a few caveats:
Your input must be clean. I had some curly quotes and non-printable chars that looked like whitespace in a few files (copypasta from Word or some such) and was seeing "Invalid byte sequence in UTF-8" as a Jekyll error.
It could break some things. I was using
<i class="icon-file"></i>
icons from twitter bootstrap. It replaced the empty tag with<i class="icon-file"/>
and bootstrap did not like that. Additionally, it screws up the octopress{% codeblock %}
s in my content. I didn't really look into why.While this will clean the output of a liquid variable such as
{{ content }}
it does not actually solve the problem in the original post, which is to indent the html in context of the surrounding html. This will provide well formatted html, but as a fragment that will not be indented relative to tags above the fragment. If you want to format everything in context, use the Rake task instead of the filter.
-
require 'rubygems'require 'json'require 'nokogiri'require 'nokogiri-pretty'module Jekyll module PrettyPrintFilter def pretty_print(input) #seeing some ASCII-8 come in input = input.encode("UTF-8") #Parsing with nokogiri first cleans up some things the XSLT can't handle content = Nokogiri::HTML::DocumentFragment.parse input parsed_content = content.to_html #Unfortunately nokogiri-pretty can't use DocumentFragments... html = Nokogiri::HTML parsed_content pretty = html.human #...so now we need to remove the stuff it added to make valid HTML output = PrettyPrintFilter.strip_extra_html(pretty) output end def PrettyPrintFilter.strip_extra_html(html) #type declaration html = html.sub('<?xml version="1.0" encoding="ISO-8859-1"?>','') #second <html> tag first = true html = html.gsub('<html>') do |match| if first == true first = false next else '' end end #first </html> tag html = html.sub('</html>','') #second <head> tag first = true html = html.gsub('<head>') do |match| if first == true first = false next else '' end end #first </head> tag html = html.sub('</head>','') #second <body> tag first = true html = html.gsub('<body>') do |match| if first == true first = false next else '' end end #first </body> tag html = html.sub('</body>','') html end endendLiquid::Template.register_filter(Jekyll::PrettyPrintFilter)
Using a Rake task
I use a task in my rakefile to pretty print the output after the jekyll site has been generated.
require 'nokogiri'require 'nokogiri-pretty'desc "Pretty print HTML output from Jekyll"task :pretty_print do #change public to _site or wherever your output goes html_files = File.join("**", "public", "**", "*.html") Dir.glob html_files do |html_file| puts "Cleaning #{html_file}" file = File.open(html_file) contents = file.read begin #we're gonna parse it as XML so we can apply an XSLT html = Nokogiri::XML(contents) #the human() method is from nokogiri-pretty. Just an XSL transform on the XML. pretty_html = html.human rescue Exception => msg puts "Failed to pretty print #{html_file}: #{msg}" end #Yep, we're overwriting the file. Potentially destructive. file = File.new(html_file,"w") file.write(pretty_html) file.close endend
We can accomplish this by writing a custom Liquid filter to tidy the html, and then doing {{content | tidy }}
to include the html.
A quick search suggests that the ruby tidy gem may not be maintained but that nokogiri is the way to go. This will of course mean installing the nokogiri gem.
See advice on writing liquid filters, and Jekyll example filters.
An example might look something like this: in _plugins
, add a script called tidy-html.rb containing:
require 'nokogiri'module TextFilter def tidy(input) desired = Nokogiri::HTML::DocumentFragment.parse(input).to_html endendLiquid::Template.register_filter(TextFilter)
(Untested)