Download image file from the HTML page source using python?

python screen-scraping

Here is some code to download all the images from the supplied URL, and save them in the specified output folder. You can modify it to your own needs.

"""dumpimages.py    Downloads all the images on the supplied URL, and saves them to the    specified output file ("/test/" by default)Usage:    python dumpimages.py http://example.com/ [output]"""from bs4 import BeautifulSoup as bsfrom urllib.request import (    urlopen, urlparse, urlunparse, urlretrieve)import osimport sysdef main(url, out_folder="/test/"):    """Downloads all the images at 'url' to /test/"""    soup = bs(urlopen(url))    parsed = list(urlparse(url))    for image in soup.findAll("img"):        print("Image: %(src)s" % image)        filename = image["src"].split("/")[-1]        parsed[2] = image["src"]        outpath = os.path.join(out_folder, filename)        if image["src"].lower().startswith("http"):            urlretrieve(image["src"], outpath)        else:            urlretrieve(urlunparse(parsed), outpath)def _usage():    print("usage: python dumpimages.py http://example.com [outpath]")if __name__ == "__main__":    url = sys.argv[-1]    out_folder = "/test/"    if not url.lower().startswith("http"):        out_folder = sys.argv[-1]        url = sys.argv[-2]        if not url.lower().startswith("http"):            _usage()            sys.exit(-1)    main(url, out_folder)

Edit: You can specify the output folder now.

python screen-scraping

Ryan's solution is good, but fails if the image source URLs are absolute URLs or anything that doesn't give a good result when simply concatenated to the main page URL. urljoin recognizes absolute vs. relative URLs, so replace the loop in the middle with:

for image in soup.findAll("img"):    print "Image: %(src)s" % image    image_url = urlparse.urljoin(url, image['src'])    filename = image["src"].split("/")[-1]    outpath = os.path.join(out_folder, filename)    urlretrieve(image_url, outpath)

python screen-scraping

You have to download the page and parse html document, find your image with regex and download it.. You can use urllib2 for downloading and Beautiful Soup for parsing html file.

CodeHunter

Download image file from the HTML page source using python?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last