download images from google with command line download images from google with command line linux linux

download images from google with command line


First attempt

First you need to set the user agent so google will authorize output from searches. Then we can look for images and select the desired one. To accomplish that we insert missing newlines, wget will return google searches on one single line, and filter the link. The index of the file is stored in the variable count.

$ count=10$ imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - "www.google.be/search?q=something\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')$ wget $imagelink 

The image will now be in your working directory, you can tweak the last command and specify a desired output file name.

You can summarize it in a shell script:

#! /bin/bashcount=${1}shiftquery="$@"[ -z $query ] && exit 1  # insufficient argumentsimagelink=$(wget --user-agent 'Mozilla/5.0' -qO - | "www.google.be/search?q=${query}\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')wget -qO google_image $imagelink

Example usage:

$ lsDocumentsDownloadsMusicscript.sh$ chmod +x script.sh$ bash script.sh 5 awesome$ lsDocumentsDownloadsgoogle_imageMusicscript.sh

Now the google_image should contain the fifth google image when looking for 'awesome'. If you experience any bugs, let me know, I'll take care of them.

Better code

The problem with this code is that it returns pictures in low resolution. A better solution is as follows:

#! /bin/bash# function to create all dirs til file can be madefunction mkdirs {    file="$1"    dir="/"    # convert to full path    if [ "${file##/*}" ]; then        file="${PWD}/${file}"    fi    # dir name of following dir    next="${file#/}"    # while not filename    while [ "${next//[^\/]/}" ]; do        # create dir if doesn't exist        [ -d "${dir}" ] || mkdir "${dir}"        dir="${dir}/${next%%/*}"        next="${next#*/}"    done    # last directory to make    [ -d "${dir}" ] || mkdir "${dir}"}# get optional 'o' flag, this will open the image after downloadgetopts 'o' option[[ $option = 'o' ]] && shift# parse argumentscount=${1}shiftquery="$@"[ -z "$query" ] && exit 1  # insufficient arguments# set user agent, customize this by visiting http://whatsmyuseragent.com/useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'# construct google linklink="www.google.cz/search?q=${query}\&tbm=isch"# fetch link for downloadimagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)imagelink="${imagelink%\%*}"# get file extention (.png, .jpg, .jpeg)ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")# set default save location and file name change this!!dir="$PWD"file="google image"# get optional second argument, which defines the file name or dirif [[ $# -eq 2 ]]; then    if [ -d "$2" ]; then        dir="$2"    else        file="${2}"        mkdirs "${dir}"        dir=""    fifi   # construct image link: add 'echo "${google_image}"'# after this line for debug outputgoogle_image="${dir}/${file}"# construct name, append number if file existsif [[ -e "${google_image}${ext}" ]] ; then    i=0    while [[ -e "${google_image}(${i})${ext}" ]] ; do        ((i++))    done    google_image="${google_image}(${i})${ext}"else    google_image="${google_image}${ext}"fi# get actual picture and store in google_image.$extwget --max-redirect 0 -qO "${google_image}" "${imagelink}"# if 'o' flag supplied: open image[[ $option = "o" ]] && gnome-open "${google_image}"# successful execution, exit code 0exit 0

The comments should be self explanatory, if you have any questions about the code (such as the long pipeline) I'll be happy to clarify the mechanics. Note that I had to set a more detailed user agent on the wget, it may happen that you need to set a different user agent but I don't think it'll be a problem. If you do have a problem, visit http://whatsmyuseragent.com/ and supply the output in the useragent variable.

When you wish to open the image instead of only downloading, use the -o flag, example below. If you wish to extend the script and also include a custom output file name, just let me know and I'll add it for you.

Example usage:

$ chmod +x getimg.sh$ ./getimg.sh 1 dog$ gnome-open google_image.jpg$ ./getimg.sh -o 10 donkey


This is an addition to the answer provided by ShellFish. Much respect to them for working this out. :)

Google have recently changed their web-code for the image results page which has, unfortunately, broken Shellfish's code. I was using it every night in a cron job up until about 4 days ago when it stopped receiving search-results. While investigating this, I found that Google have removed elements like imgurl and have shifted a lot more into javascript.

My solution is an expansion of Shellfish's great code but has modifications to handle these Google changes and includes some 'enhancements' of my own.

It performs a single Google search, saves the results, bulk-downloads a specified number of images, then builds these into a single gallery-image using ImageMagick. Up to 1,000 images can be requested.

This bash script is available at https://git.io/googliser

Thank you.


Python code to download high resolution images from Google. i had poste the original answer here Python - Download Images from google Image search?

Currently downloads 100 original images given a search query

Code

from bs4 import BeautifulSoupimport requestsimport reimport urllib2import osimport cookielibimport jsondef get_soup(url,header):    return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)))query = raw_input("query image")# you can change the query for the image  hereimage_type="ActiOn"query= query.split()query='+'.join(query)url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"print url#add the directory for your image hereDIR="C:\\Users\\Rishabh\\Pictures\\"+query.split('+')[0]+"\\"header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"}soup = get_soup(url,header)ActualImages=[]# contains the link for Large original images, type of  imagefor a in soup.find_all("div",{"class":"rg_meta"}):    link , Type =json.loads(a.text)["ou"]  ,json.loads(a.text)["ity"]    ActualImages.append((link,Type))print  "there are total" , len(ActualImages),"images"###print imagesfor i , (img , Type) in enumerate( ActualImages):    try:        req = urllib2.Request(img, headers={'User-Agent' : header})        raw_img = urllib2.urlopen(req).read()        if not os.path.exists(DIR):            os.mkdir(DIR)        cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1        print cntr        if len(Type)==0:            f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')        else :            f = open(DIR + image_type + "_"+ str(cntr)+"."+Type, 'wb')        f.write(raw_img)        f.close()    except Exception as e:        print "could not load : "+img        print e