how to? xmlstarlet to extract HTML data by id
The html
data has a default namespace that you have to declare in the xmlstarlet
command:
xmlstarlet sel \ -N n="http://www.w3.org/1999/xhtml" \ -t \ -c "/n:html/n:body/n:table[@id='test_table']/descendant::*/text()" \htmlfile 2>/dev/null
Once located the <table>
element I use descendant::*/text()
to extract all text elements of it, and also use 2>/dev/null
to skip the warning:
Attempt to load network entity http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
It yields:
testtestmo test
UPDATE: I didn't know it but as the error message says, there is no need to declare the namespace when it's the default one, so also this works:
xmlstarlet sel \ -t \ -c "/_:html/_:body/_:table[@id='test_table']/descendant::*/text()" \htmlfile 2>/dev/null
As is mentioned in
http://xmlstar.sourceforge.net/doc/UG/ch05.html
common problems when using the
-N x="http://www.w3.org/1999/xhtml" \
option you also have to prefix the node selections with
x:
e.g.
xmlstarlet sel \ -N x="http://www.w3.org/1999/xhtml" \ -t \ -m "//x:pre" \ -v . somehtml.html
will select all pre nodes
You can try
xmlstarlet ed --inplace -u "html/body/table[@id='your_tabl e_id']/tr[@id='row_id']/td[@id='data_id']" -v NEW_VALUE_TO_BE_CHANGED HTMLFILE_NAME 2>/dev/null