Using VBA in Excel to Google Search in IE and return the hyperlink of the first result Using VBA in Excel to Google Search in IE and return the hyperlink of the first result vba vba

Using VBA in Excel to Google Search in IE and return the hyperlink of the first result


As its 60,000 records i recommend use xmlHTTP object instead of using IE.
HTTP requests a easier, and a lot faster

Download the sample file

Sub XMLHTTP()    Dim url As String, lastRow As Long, i As Long    Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object    Dim start_time As Date    Dim end_time As Date    lastRow = Range("A" & Rows.Count).End(xlUp).Row    Dim cookie As String    Dim result_cookie As String    start_time = Time    Debug.Print "start_time:" & start_time    For i = 2 To lastRow        url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")        XMLHTTP.Open "GET", url, False        XMLHTTP.setRequestHeader "Content-Type", "text/xml"        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"        XMLHTTP.send        Set html = CreateObject("htmlfile")        html.body.innerHTML = XMLHTTP.ResponseText        Set objResultDiv = html.getelementbyid("rso")        Set objH3 = objResultDiv.getelementsbytagname("h3")        For Each link In objH3            If link.className = "r" Then                Cells(i, 2) = link.innerText                Cells(i, 3) = link.getelementsbytagname("a")(0).href                DoEvents            End If        Next    Next    end_time = Time    Debug.Print "end_time:" & end_time    Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time) & " :minutes"    MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)End Sub

Using CSS3 Selector

 Sub XMLHTTP1()        Dim url As String, i As Long, lastRow As Long        Dim XMLHTTP As Object, html As New HTMLDocument, objResultDiv As HTMLAnchorElement        lastRow = Range("A" & Rows.Count).End(xlUp).Row        For i = 2 To lastRow            url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")            XMLHTTP.Open "GET", url, False            XMLHTTP.setRequestHeader "Content-Type", "text/xml"            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"            XMLHTTP.send            Set html = New HTMLDocument            html.body.innerHTML = XMLHTTP.ResponseText            Set objResultDiv = html.querySelector("div#rso h3.r a")            Cells(i, 2) = objResultDiv.innerText            Cells(i, 3) = objResultDiv.href            DoEvents        Next    End Sub

Output

enter image description here

HTH
Santosh


The links seem to be consistently within within H3 tags. Normally you might use something like the following to check until the page has loaded:

Private Declare Sub Sleep Lib "kernel32" (ByVal nMilliseconds As Long)Sub UseIE()    Dim ie As Object    Dim thePage As Object    Dim strTextOfPage As String    Set ie = CreateObject("InternetExplorer.Application")    'ie.FullScreen = True    With ie        '.Visible = True        .Navigate "http://www.bbc.co.uk"        While Not .ReadyState = READYSTATE_COMPLETE '4            Sleep 500      'wait 1/2 sec before trying again        Wend    End With    Set thePage = ie.Document    'more code hereEnd Sub

However, I would, instead, repeatedly try to reference the A element within the first H3 using getElementsByTagName("H3"), get the first of these elements, then look within this for the A-link and its href-attribute.

In JavaScript the attempts to reference non-existent elements would return undefined but from VBA it will probably need error-handling code.

Once I had obtained the href I would stop the navigation (not sure of the command for this, probably ie.Stop) or navigate to the next page immediately.

The first link(s) will, however, often be sponsored links and the href returned is a little garbled. The text of these sponsored links appear to include em tags. I might use this information to discard these links and look further down the page.

I don't know if there is a better way to do this.