Extract xml tag value using awk command Extract xml tag value using awk command unix unix

Extract xml tag value using awk command


You can use awk as shown below, however, this is NOT a robust solution and will fail if the xml is not formatted correctly e.g. if there are multiple elements on the same line.

$ dt=$(awk -F '[<>]' '/IntrBkSttlmDt/{print $3}' file)$ echo $dt1967-08-13

I suggest you use a proper xml processing tool, like xmllint.

$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")$ echo $dt1967-08-13


The following gawk command uses a record separator regex pattern to match the XML tags. Anything starting with a < followed by at least one non-> and terminated by a > is considered to be a tag. Gawk assigns each RS match into the RT variable. Anything between the tags will be parsed as the record text which gawk assigns to $0.

gawk 'BEGIN { RS="<[^>]+>" } { print RT, $0 }' myfile


below code stores all the tag values in an array!hope this helps.But i still belive this is not an optimal way to do it.

> perl -lne 'if(/>[^<]*</){$_=~m/>([^<]*)</;push(@a,$1)}if(eof){foreach(@a){print $_}}' tempA2001-12-17T09:30:4700.01967-08-13CLRGxxAAAAAAAAAAA