How do I make JTIdy make HTML documents well-formed? How do I make JTIdy make HTML documents well-formed? xml xml

How do I make JTIdy make HTML documents well-formed?


You need specify several flags to Tidy if you want XML format

private String cleanData(String data) throws UnsupportedEncodingException {    Tidy tidy = new Tidy();    tidy.setInputEncoding("UTF-8");    tidy.setOutputEncoding("UTF-8");    tidy.setWraplen(Integer.MAX_VALUE);    tidy.setPrintBodyOnly(true);    tidy.setXmlOut(true);    tidy.setSmartIndent(true);    ByteArrayInputStream inputStream = new ByteArrayInputStream(data.getBytes("UTF-8"));    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();    tidy.parseDOM(inputStream, outputStream);    return outputStream.toString("UTF-8");}

Or simply if want XHTML form

Tidy tidy = new Tidy();tidy.setXHTML(true);


use tidy.setXmlTags(true); to parse XML instead of HTML


Use Tidy.setForceOutput(true) (at your own risk) to generate the output even if errors are found.