How do I make JTIdy make HTML documents well-formed?
You need specify several flags to Tidy if you want XML format
private String cleanData(String data) throws UnsupportedEncodingException { Tidy tidy = new Tidy(); tidy.setInputEncoding("UTF-8"); tidy.setOutputEncoding("UTF-8"); tidy.setWraplen(Integer.MAX_VALUE); tidy.setPrintBodyOnly(true); tidy.setXmlOut(true); tidy.setSmartIndent(true); ByteArrayInputStream inputStream = new ByteArrayInputStream(data.getBytes("UTF-8")); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); tidy.parseDOM(inputStream, outputStream); return outputStream.toString("UTF-8");}
Or simply if want XHTML form
Tidy tidy = new Tidy();tidy.setXHTML(true);