Add/remove xml tags using a bash script Add/remove xml tags using a bash script xml xml

Add/remove xml tags using a bash script


This would not be difficult to do in sed, as sed also works on ranges.

Try this (assuming xml is in a file named foo.xml):

sed -i '/<b>/,/<\/b>/d' foo.xml

-i will write the change into the original file (use -i.bak to keep a backup copy of the original)

This sed command will perform an action d (delete) on all of the lines specified by the range

# all of the lines between a line that matches <b># and the next line that matches <\/b>, inclusive/<b>/,/<\/b>/

So, in plain English, this command will delete all of the lines between and including the line with <b> and the line with </b>

If you'd rather comment out the lines, try one of these:

# block commentsed -i 's/<b>/<!-- <b>/; s/<\/b>/<\/b> -->/' foo.xml# comment out every line in the rangesed -i '/<b>/,/<\/b>/s/.*/<!-- & -->/' foo.xml


Using xmlstarlet:

#xmlstarlet ed -d "/a/b" file.xml > tmp.xmlxmlstarlet ed -d "//b" file.xml > tmp.xmlmv tmp.xml file.xml


You can use an XSLT such as this that is a modified identity transform. It copies all of the content by default, and has an empty template for b that does nothing(effectively deleting from output):

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><!--Identity transform copies all items by default --><xsl:template match="@* | node()">    <xsl:copy>        <xsl:apply-templates select="@*|node()"/>    </xsl:copy></xsl:template><!--Empty template to match on b elements and prevent it from being copied to output --><xsl:template match="b"/></xsl:stylesheet>

Create a bash script that executes the transform using Java and the Xalan commandline utility like this:

java org.apache.xalan.xslt.Process -IN foo.xml -XSL foo.xsl -OUT foo.out

The result is this:

<?xml version="1.0" encoding="UTF-16"?><a><c><cc>      Something    </cc></c><d>    bla  </d></a>

EDIT: if you would prefer to have the b commented out, to make it easier to put back, then use this stylesheet:

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">    <!--Identity transform copies all items by default -->    <xsl:template match="@* | node()">        <xsl:copy>            <xsl:apply-templates select="@*|node()"/>        </xsl:copy>    </xsl:template>    <!--Match on b element, wrap in a comment and construct text representing XML structure by applying templates in "comment" mode -->    <xsl:template match="b">        <xsl:comment>            <xsl:apply-templates select="self::*" mode="comment" />        </xsl:comment>    </xsl:template>    <xsl:template match="*" mode="comment">        <xsl:value-of select="'<'"/>            <xsl:value-of select="name()"/>        <xsl:value-of select="'>'"/>            <xsl:apply-templates select="@*|node()" mode="comment" />        <xsl:value-of select="'</'"/>            <xsl:value-of select="name()"/>        <xsl:value-of select="'>'"/>    </xsl:template>    <xsl:template match="text()" mode="comment">        <xsl:value-of select="."/>    </xsl:template>    <xsl:template match="@*" mode="comment">        <xsl:value-of select="name()"/>        <xsl:text>="</xsl:text>        <xsl:value-of select="."/>        <xsl:text>" </xsl:text>    </xsl:template></xsl:stylesheet>

It produces this output:

<?xml version="1.0" encoding="UTF-16"?><a><!--<b><bb><yyy>            Bla        </yyy></bb></b>--><c><cc>      Something    </cc></c><d>    bla  </d></a>