How to Pretty Print HTML to a file, with indentation How to Pretty Print HTML to a file, with indentation python python

How to Pretty Print HTML to a file, with indentation


I ended up using BeautifulSoup directly. That is something lxml.html.soupparser uses for parsing HTML.

BeautifulSoup has a prettify method that does exactly what it says it does. It prettifies the HTML with proper indents and everything.

BeautifulSoup will NOT fix the HTML, so broken code, remains broken. But in this case, since the code is being generated by lxml, the HTML code should be at least semantically correct.

In the example given in my question, I will have to do this :

from BeautifulSoup import BeautifulSoup as bsroot = lh.tostring(sliderRoot) #convert the generated HTML to a stringsoup = bs(root)                #make BeautifulSoupprettyHTML = soup.prettify()   #prettify the html


Though my answer might not be helpful now, I am dropping it here to act as a reference to anybody else in future.

lxml.html.tostring(), indeed, doesn't pretty print the provided HTML in spite of pretty_print=True.

However, the "sibling" of lxml.html - lxml.etree has it working well.

So one might use it as following:

from lxml import etree, htmldocument_root = html.fromstring("<html><body><h1>hello world</h1></body></html>")print(etree.tostring(document_root, encoding='unicode', pretty_print=True))

The output is like this:

<html>  <body>    <h1>hello world</h1>  </body></html>


If you store the HTML as an unformatted string, in a variable html_string, it can be done using beautifulsoup4 as follows:

from bs4 import BeautifulSoupprint(BeautifulSoup(html_string, 'html.parser').prettify())