how to? xmlstarlet to extract HTML data by id how to? xmlstarlet to extract HTML data by id xml xml

how to? xmlstarlet to extract HTML data by id


The html data has a default namespace that you have to declare in the xmlstarlet command:

xmlstarlet sel \    -N n="http://www.w3.org/1999/xhtml" \    -t \    -c "/n:html/n:body/n:table[@id='test_table']/descendant::*/text()" \htmlfile 2>/dev/null

Once located the <table> element I use descendant::*/text() to extract all text elements of it, and also use 2>/dev/null to skip the warning:

Attempt to load network entity http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

It yields:

testtestmo test

UPDATE: I didn't know it but as the error message says, there is no need to declare the namespace when it's the default one, so also this works:

xmlstarlet sel \    -t \    -c "/_:html/_:body/_:table[@id='test_table']/descendant::*/text()" \htmlfile 2>/dev/null


As is mentioned in

http://xmlstar.sourceforge.net/doc/UG/ch05.html

common problems when using the

-N x="http://www.w3.org/1999/xhtml" \

option you also have to prefix the node selections with

x:

e.g.

 xmlstarlet sel \  -N x="http://www.w3.org/1999/xhtml" \  -t \  -m "//x:pre" \  -v . somehtml.html

will select all pre nodes


You can try

xmlstarlet ed --inplace -u "html/body/table[@id='your_tabl e_id']/tr[@id='row_id']/td[@id='data_id']" -v NEW_VALUE_TO_BE_CHANGED HTMLFILE_NAME 2>/dev/null