Parse HTML Using AWK

The result of a quick google for xmlstarlet print div contents and then a few secs of trial and error:

$ xmlstarlet sel -t -m "//*[@class='product-price']" -v "." -n file100,56200,56300,56400,56

For an explanation - ask google :-).

html shell awk

With your shown samples/attempts, please try following awk code.

awk -F"[><]" '{gsub(/\r/,"")} /^[ \t]+<div[ \t]+class="product-price">.*<\/div>/{print $3}' Input_file

Explanation: Adding detailed explanation for above. This is only for explanation purposes for running code please use above one.

awk -F"[><]" '      ##Starting awk program from here and setting field separator as ><{gsub(/\r/,"")}     ##Substituting control M chars at last of lines./^[ \t]+<div[ \t]+class="product-price">.*<\/div>/{ ##checking condition if line starts                    ##from space followed by <div class=product-price"> till div close tag.  print $3          ##printing 3rd column here.}' Input_file        ##Mentioning Input_file name here.

Changed regex to /^[ \t]+<div[ \t]+class as per Ed's suggestions in comments. Also its always recommended by experts to use xmlstarlet/xml aware tools in case someone has in their system.

html shell awk

If someone is looking for Python related solution, I would suggest use beautifulsoup library of Python, following is written and tested in Python3.8. To segregate it from my previous answer I am adding another answer here.

#!/bin/python3##import library here.  from bs4 import BeautifulSoup##Read Input_file and get its all contents.with open('Input_file', 'r') as f:    contents = f.read()    f.close()##Get contents in form of xml in soup variable here.soup = BeautifulSoup(contents, 'lxml')##get only those values which specifically needed by OP of div class.vals = (soup.find_all("div", {"class": "product-price"}))##Print actual values out of tags.for val in vals:    print (val.text)

NOTE:

One should have BeautifulSoup installed in Python along with install lxml with pip3 or pip depending upon your system.
Where Input_file is one where program is reading your all data.

CodeHunter

Parse HTML Using AWK

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last