How to remove XML tags from Unix command line? How to remove XML tags from Unix command line? xml xml

How to remove XML tags from Unix command line?


If your file looks just like that, then sed can help you:

sed -e 's/<[^>]*>//g' file.xml

Of course you should not use regular expressions for parsing XML because it's hard.


Using awk:

awk '{gsub(/<[^>]*>/,"")};1' file.xml


Give this a try:

grep -Po '<.*?>\K.*?(?=<.*?>)' inputfile

Explanation:

Using Perl Compatible Regular Expressions (-P) and outputting only the specified matches (-o):

  • <.*?> - Non-greedy match of any characters within angle brackets
  • \K - Don't include the preceding match in the output (reset match start - similar to positive look-behind, but it works with variable-length matches)
  • .*? - Non-greedy match stopping at the next match (this part will be output)
  • (?=<.*?>) - Non-greedy match of any characters within angle brackets and don't include the match in the output (positive look-ahead - works with variable-length matches)