Custom indent width for BeautifulSoup .prettify() Custom indent width for BeautifulSoup .prettify() python python

Custom indent width for BeautifulSoup .prettify()


I actually dealt with this myself, in the hackiest way possible: by post-processing the result.

r = re.compile(r'^(\s*)', re.MULTILINE)def prettify_2space(s, encoding=None, formatter="minimal"):    return r.sub(r'\1\1', s.prettify(encoding, formatter))

Actually, I monkeypatched prettify_2space in place of prettify in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:

orig_prettify = bs4.BeautifulSoup.prettifyr = re.compile(r'^(\s*)', re.MULTILINE)def prettify(self, encoding=None, formatter="minimal", indent_width=4):    return r.sub(r'\1' * indent_width, orig_prettify(self, encoding, formatter))bs4.BeautifulSoup.prettify = prettify

So:

x = '''<section><article><h1></h1><p></p></article></section>'''soup = bs4.BeautifulSoup(x)print(soup.prettify(indent_width=3))

… gives:

<html>   <body>      <section>         <article>            <h1>            </h1>            <p>            </p>         </article>      </section>   </body></html>

Obviously if you want to patch Tag.prettify as well as BeautifulSoup.prettify, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify methods, same deal.


As far as I can tell, this feature is not built in, as there are a handful of solutions out there for this problem.

Assuming you are using BeautifulSoup 4, here are the solutions I came up with

Hardcode it in. This requires minimal changes, this is fine if you don't need the indent to be different in different circumstances:

myTab = 4 # add thisif pretty_print:   # space = (' ' * (indent_level - 1))    space = (' ' * (indent_level - myTab))    #indent_contents = indent_level + 1    indent_contents = indent_level + myTab 

Another problem with the previous solution is that the text content wont be indented entirely consistently, but attractively, still. If you need a more flexible/consistent solution, you can just modify the class.

Find the prettify function and modify it as such (it is located in the Tag class in element.py):

#Add the myTab keyword to the functions parameters (or whatever you want to call it), set it to your preferred default.def prettify(self, encoding=None, formatter="minimal", myTab=2):     Tag.myTab= myTab # add a reference to it in the Tag class    if encoding is None:        return self.decode(True, formatter=formatter)    else:        return self.encode(encoding, True, formatter=formatter)

And then scroll up to the decode method in the Tag class and make the following changes:

if pretty_print:    #space = (' ' * (indent_level - 1))    space = (' ' * (indent_level - Tag.myTab))    #indent_contents = indent_level + Tag.myTab     indent_contents = indent_level + Tag.myTab

Then go to the decode_contents method in the Tag class and make these changes:

#s.append(" " * (indent_level - 1))s.append(" " * (indent_level - Tag.myTab))

Now BeautifulSoup('<root><child><desc>Text</desc></child></root>').prettify(myTab=4) will return:

<root>    <child>        <desc>            Text        </desc>    </child></root>

**No need to patch BeautifulSoup class as it inherits the Tag class. Patching Tag class is sufficient enough to achieve the goal.


Here's a way to increase indentation w/o meddling with original functions, etc. Create the following function:

# Increase indentation of 'text' by 'n' spacesdef add_indent(text,n):  sp = " "*n  lsep = chr(10) if text.find(chr(13)) == -1 else chr(13)+chr(10)  lines = text.split(lsep)  for i in range(len(lines)):    spacediff = len(lines[i]) - len(lines[i].lstrip())    if spacediff: lines[i] = sp*spacediff + lines[i]   return lsep.join(lines)

Then convert the text you obtained using the above function:

x = '''<section><article><h1></h1><p></p></article></section>'''soup = bs4.BeautifulSoup(x, 'html.parser')  # I don't know if you need 'html.parser'text = soup.prettify()                      # I do, otherwise I get a warningtext = add_indent(text,1) # Increase indentation by 1 space print(text)'''Output:<html>  <body>    <section>      <article>        <h1>        </h1>        <p>        </p>      </article>    </section>  </body></html>'''