Using VBA in Excel to Google Search in IE and return the hyperlink of the first result
As its 60,000 records i recommend use xmlHTTP object instead of using IE.
HTTP requests a easier, and a lot faster
Sub XMLHTTP() Dim url As String, lastRow As Long, i As Long Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object Dim start_time As Date Dim end_time As Date lastRow = Range("A" & Rows.Count).End(xlUp).Row Dim cookie As String Dim result_cookie As String start_time = Time Debug.Print "start_time:" & start_time For i = 2 To lastRow url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000) Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP") XMLHTTP.Open "GET", url, False XMLHTTP.setRequestHeader "Content-Type", "text/xml" XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0" XMLHTTP.send Set html = CreateObject("htmlfile") html.body.innerHTML = XMLHTTP.ResponseText Set objResultDiv = html.getelementbyid("rso") Set objH3 = objResultDiv.getelementsbytagname("h3") For Each link In objH3 If link.className = "r" Then Cells(i, 2) = link.innerText Cells(i, 3) = link.getelementsbytagname("a")(0).href DoEvents End If Next Next end_time = Time Debug.Print "end_time:" & end_time Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time) & " :minutes" MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)End Sub
Using CSS3 Selector
Sub XMLHTTP1() Dim url As String, i As Long, lastRow As Long Dim XMLHTTP As Object, html As New HTMLDocument, objResultDiv As HTMLAnchorElement lastRow = Range("A" & Rows.Count).End(xlUp).Row For i = 2 To lastRow url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000) Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP") XMLHTTP.Open "GET", url, False XMLHTTP.setRequestHeader "Content-Type", "text/xml" XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0" XMLHTTP.send Set html = New HTMLDocument html.body.innerHTML = XMLHTTP.ResponseText Set objResultDiv = html.querySelector("div#rso h3.r a") Cells(i, 2) = objResultDiv.innerText Cells(i, 3) = objResultDiv.href DoEvents Next End Sub
Output
HTH
Santosh
The links seem to be consistently within within H3 tags. Normally you might use something like the following to check until the page has loaded:
Private Declare Sub Sleep Lib "kernel32" (ByVal nMilliseconds As Long)Sub UseIE() Dim ie As Object Dim thePage As Object Dim strTextOfPage As String Set ie = CreateObject("InternetExplorer.Application") 'ie.FullScreen = True With ie '.Visible = True .Navigate "http://www.bbc.co.uk" While Not .ReadyState = READYSTATE_COMPLETE '4 Sleep 500 'wait 1/2 sec before trying again Wend End With Set thePage = ie.Document 'more code hereEnd Sub
However, I would, instead, repeatedly try to reference the A element within the first H3 using getElementsByTagName("H3")
, get the first of these elements, then look within this for the A-link and its href-attribute.
In JavaScript the attempts to reference non-existent elements would return undefined
but from VBA it will probably need error-handling code.
Once I had obtained the href I would stop the navigation (not sure of the command for this, probably ie.Stop
) or navigate to the next page immediately.
The first link(s) will, however, often be sponsored links and the href returned is a little garbled. The text of these sponsored links appear to include em
tags. I might use this information to discard these links and look further down the page.
I don't know if there is a better way to do this.