How to strip out all of the links of an HTML file in Bash or grep or batch and store them in a text file

bash shell awk grep cut

$ sed -n 's/.*href="\([^"]*\).*/\1/p' filehttp://www.drawspace.com/lessons/b03/simple-symmetryhttp://www.drawspace.com/lessons/b04/faces-and-a-vasehttp://www.drawspace.com/lessons/b05/blind-contour-drawinghttp://www.drawspace.com/lessons/b06/seeing-values

bash shell awk grep cut

You can use grep for this:

grep -Po '(?<=href=")[^"]*' file

It prints everything after href=" until a new double quote appears.

With your given input it returns:

http://www.drawspace.com/lessons/b03/simple-symmetryhttp://www.drawspace.com/lessons/b04/faces-and-a-vasehttp://www.drawspace.com/lessons/b05/blind-contour-drawinghttp://www.drawspace.com/lessons/b06/seeing-values

Note that it is not necessary to write cat drawspace.txt | grep '<a href=".*">', you can get rid of the useless use of cat with grep '<a href=".*">' drawspace.txt.

Another example

$ cat ahello <a href="httafasdf">asdas</a>hello <a href="hello">asdas</a>other things$ grep -Po '(?<=href=")[^"]*' ahttafasdfhello

bash shell awk grep cut

My guess is your PC or Mac will not have the lynx command installed by default (it's available for free on the web), but lynx will let you do things like this:

$lynx -dump -image_links -listonly /usr/share/xdiagnose/workloads/youtube-reload.html

Output:References

file://localhost/usr/share/xdiagnose/workloads/youtube-reload.html
http://www.youtube.com/v/zeNXuC3N5TQ&hl=en&fs=1&autoplay=1

It is then a simple matter to grep for the http: lines. And there even may be lynx options to print just the http: lines (lynx has many, many options).

CodeHunter

How to strip out all of the links of an HTML file in Bash or grep or batch and store them in a text file

Another example

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last