How to strip whitespace-only text nodes from a DOM before serialization?
You can find empty text nodes using XPath, then remove them programmatically like so:
XPathFactory xpathFactory = XPathFactory.newInstance();// XPath to find empty text nodes.XPathExpression xpathExp = xpathFactory.newXPath().compile( "//text()[normalize-space(.) = '']"); NodeList emptyTextNodes = (NodeList) xpathExp.evaluate(doc, XPathConstants.NODESET);// Remove each empty text node from document.for (int i = 0; i < emptyTextNodes.getLength(); i++) { Node emptyTextNode = emptyTextNodes.item(i); emptyTextNode.getParentNode().removeChild(emptyTextNode);}
This approach might be useful if you want more control over node removal than is easily achieved with an XSL template.
Try using the following XSL and the strip-space
element to serialize your DOM:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template></xsl:stylesheet>
http://helpdesk.objects.com.au/java/how-do-i-remove-whitespace-from-an-xml-document
Below code deletes the comment nodes and text nodes with all empty spaces. If the text node has some value, value will be trimmed
public static void clean(Node node){ NodeList childNodes = node.getChildNodes(); for (int n = childNodes.getLength() - 1; n >= 0; n--) { Node child = childNodes.item(n); short nodeType = child.getNodeType(); if (nodeType == Node.ELEMENT_NODE) clean(child); else if (nodeType == Node.TEXT_NODE) { String trimmedNodeVal = child.getNodeValue().trim(); if (trimmedNodeVal.length() == 0) node.removeChild(child); else child.setNodeValue(trimmedNodeVal); } else if (nodeType == Node.COMMENT_NODE) node.removeChild(child); }}
Ref: http://www.sitepoint.com/removing-useless-nodes-from-the-dom/