How to format XML document in Linux How to format XML document in Linux xml xml

How to format XML document in Linux


xmllint -format -recover nonformatted.xml > formated.xml

For tab indentation:

export XMLLINT_INDENT=`echo -e '\t'`

For four space indentation:

export XMLLINT_INDENT=\ \ \ \ 


Without programming you can use Eclipse XML Source Editor. Have a look at this answer

By the way have you tried xmllint -format -recover nonformatted.xml > formated.xml?

EDIT:

You can try this XMLStarlet Command Line XML Toolkit.

5. Formatting XML documents====================================================xml fo --helpXMLStarlet Toolkit: Format XML documentUsage: xml fo [<options>] <xml-file>where <options> are   -n or --noindent            - do not indent   -t or --indent-tab          - indent output with tabulation   -s or --indent-spaces <num> - indent output with <num> spaces   -o or --omit-decl           - omit xml declaration <?xml version="1.0"?>   -R or --recover             - try to recover what is parsable   -D or --dropdtd             - remove the DOCTYPE of the input docs   -C or --nocdata             - replace cdata section with text nodes   -N or --nsclean             - remove redundant namespace declarations   -e or --encode <encoding>   - output in the given encoding (utf-8, unicode...)   -H or --html                - input is HTML   -h or --help                - print help


I do it from gedit. In gedit, you can add any script, in particular a Python script, as an External Tool. The script reads data from stdin and writes output to stdout, so it may be used as a stand-alone program. It layouts XML and sorts child nodes.

#!/usr/bin/env python# encoding: utf-8"""This is a gedit plug-in to sort and layout XML.In gedit, to add this tool, open: menu -- Tools -- Manage External Tools...Create a new tool: click [+] under the list of tools, type in "Sort XML" as tool name,paste the whole text from this file in the "Edit:" box, then configure the tool:Input: Current selectionOutput: Replace current selectionIn gedit, to run this tool,FIRST SELECT THE XML,then open: menu -- Tools -- External Tools > -- Sort XML"""from lxml import etreeimport sysimport iodef headerFirst(node):    """Return the sorting key prefix, so that 'header' will go before any other node    """    nodetag=('%s' % node.tag).lower()    if nodetag.endswith('}header') or nodetag == 'header':        return '0'    else:        return '1'def get_node_key(node, attr=None):    """Return the sorting key of an xml node    using tag and attributes    """    if attr is None:        return '%s' % node.tag + ':'.join([node.get(attr)                                        for attr in sorted(node.attrib)])    if attr in node.attrib:        return '%s:%s' % (node.tag, node.get(attr))    return '%s' % node.tagdef sort_children(node, attr=None):    """ Sort children along tag and given attribute.    if attr is None, sort along all attributes"""    if not isinstance(node.tag, str):  # PYTHON 2: use basestring instead        # not a TAG, it is comment or DATA        # no need to sort        return    # sort child along attr    node[:] = sorted(node, key=lambda child: (headerFirst(child) + get_node_key(child, attr)))    # and recurse    for child in node:        sort_children(child, attr)def sort(unsorted_stream, sorted_stream, attr=None):    """Sort unsorted xml file and save to sorted_file"""    parser = etree.XMLParser(remove_blank_text=True)    tree = etree.parse(unsorted_stream,parser=parser)    root = tree.getroot()    sort_children(root, attr)    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")    sorted_stream.write('%s' % sorted_unicode)#we could do this, #sort(sys.stdin, sys.stdout)#but we want to check selection:inputstr = ''for line in sys.stdin:  inputstr += lineif not inputstr:   sys.stderr.write('no XML selected!')   exit(100)sort(io.BytesIO(inputstr), sys.stdout)

There are two tricky things:

    parser = etree.XMLParser(remove_blank_text=True)    tree = etree.parse(unsorted_stream,parser=parser)

By default, the spaces are not ignored, which may produce a strange result.

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

Again, by default there is no pretty-printing either.

I configure this tool to work on the current selection and replace the current selection because usually there are HTTP headers in the same file, YMMV.

$ python --versionPython 2.7.6$ lsb_release -aDistributor ID: UbuntuDescription:    Ubuntu 14.04.5 LTSRelease:    14.04Codename:   trusty

If you do not need child node sorting, just comment the corresponding line out.

Links: here, here

UPDATE v2 places header in front of anything else; fixed spaces

UPDATE getting lxml on Ubuntu 18.04.3 LTS bionic:

sudo apt install python-pippip install --upgrade lxml$ python --versionPython 2.7.15+