Extract XML Value in bash script [duplicate]

xml bash shell sed

As Charles Duffey has stated, XML parsers are best parsed with a proper XML parsing tools. For one time job the following should work.

grep -oPm1 "(?<=<title>)[^<]+"

Test:

$ echo "$data"<item>   <title>15:54:57 - George:</title>  <description>Diane DeConn? You saw Diane DeConn!</description> </item> <item>   <title>15:55:17 - Jerry:</title>   <description>Something huh?</description>$ title=$(grep -oPm1 "(?<=<title>)[^<]+" <<< "$data")$ echo "$title"15:54:57 - George:

xml bash shell sed

XMLStarlet or another XPath engine is the correct tool for this job.

For instance, with data.xml containing the following:

<root>  <item>     <title>15:54:57 - George:</title>    <description>Diane DeConn? You saw Diane DeConn!</description>   </item>   <item>     <title>15:55:17 - Jerry:</title>     <description>Something huh?</description>  </item></root>

...you can extract only the first title with the following:

xmlstarlet sel -t -m '//title[1]' -v . -n <data.xml

Trying to use sed for this job is troublesome. For instance, the regex-based approaches won't work if the title has attributes; won't handle CDATA sections; won't correctly recognize namespace mappings; can't determine whether a portion of the XML documented is commented out; won't unescape attribute references (such as changing Brewster & Jobs to Brewster & Jobs), and so forth.

xml bash shell sed

I agree with Charles Duffy that a proper XML parser is the right way to go.

But as to what's wrong with your sed command (or did you do it on purpose?).

$data was not quoted, so $data is subject to shell's word splitting, filename expansion among other things. One of the consequences being that the spacing in the XML snippet is not preserved.

So given your specific XML structure, this modified sed command should work

title=$(sed -ne '/title/{s/.*<title>\(.*\)<\/title>.*/\1/p;q;}' <<< "$data")

Basically for the line that contains title, extract the text between the tags, then quit (so you don't extract the 2nd <title>)

CodeHunter

Extract XML Value in bash script [duplicate]

Test:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last