Unable to shake off hardcoded delay from my script Unable to shake off hardcoded delay from my script selenium selenium

Unable to shake off hardcoded delay from my script


Here is a completely different approach that doesn't require using a browser, instead it submits a series of web requests. With this approach, waiting for a page to load isn't a concern.

Typically, with lazy loading pages, it will submit a new request to load up the data for the page as you scroll. If you monitor the web traffic you can spot the requests made and emulate those, I have done that below.

The result should be a list of company names, in ascending order in whatever the first sheet of Excel is.

Things you'll need:

Add References to:

  • Microsoft Scripting Runtime
  • Microsoft XML v6.0
  • Add the VBA-JSON code to your project. You can find that here

Edit

Changed the code to keep pulling data from the site, until there is no more items in the list. Thanks @Qharr for pointing this out.

Code


Public Sub SubmitRequest()    Const baseURL As String = "http://fortune.com/api/v2/list/2358051/expand/item/ranking/asc/"    Dim Url            As String    Dim startingNumber As Long    Dim j              As Long    Dim getRequest     As MSXML2.XMLHTTP60    Dim Json           As Object    Dim Companies      As Object    Dim Company        As Variant    Dim CompanyArray   As Variant    'Create an array to hold each company    ReDim CompanyArray(0 To 50000)    'Create a new XMLHTTP object so we can place a get request    Set getRequest = New MSXML2.XMLHTTP60    'The api seems to only support returning 100 records at a time    'So do in batches of 100    Do        'Build the url, the format is something like        '0/100, where 0 is the starting position, and 100 is the ending position        Url = baseURL & startingNumber & "/" & startingNumber + 100        With getRequest            .Open "GET", Url            .send            'The response is a JSON object, for this code to work -            'You'll need this code https://github.com/VBA-tools/VBA-JSON            'What is returned is a dictionary            Set Json = JsonConverter.ParseJson(.responseText)            Set Companies = Json("list-items")            'Keep checking in batches of 100 until there are no more            If Companies.Count = 0 Then Exit Do            'Iterate the dictionary and return the title (which is the name)            For Each Company In Companies                CompanyArray(j) = Company("title")                j = j + 1            Next        End With        startingNumber = startingNumber + 100   Loop    ReDim Preserve CompanyArray(j - 1)    'Dump the data to the first sheet    ThisWorkbook.Sheets(1).Range("A1:A" & j) = WorksheetFunction.Transpose(CompanyArray)End Sub


There you go:

Sub Getlinks()    Dim driver As New ChromeDriver    Dim pcount As Long, R as long    Dim posts As Object, post As Object    With driver        .get "http://fortune.com/fortune500/list/"        Do            .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")            Set posts = .FindElementsByClass("company-title")            pcount = posts.Count        Loop Until pcount = 1000        For Each post In posts            R = R + 1: Cells(R, 1) = post.Text        Next post    End WithEnd Sub

Or even better, print as you go:

Sub Getlinksasyougo()    Dim driver As New ChromeDriver    Dim pcount As Long, R As Long, i As Long    Dim posts As Object, post As Object    With driver        .get "http://fortune.com/fortune500/list/"        i = 1        Do            .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")            Set posts = .FindElementsByClass("company-title")            pcount = posts.Count            If i <> pcount Then                For R = i To pcount - 1                    Cells(R, 1) = posts(R + 1).Text                Next R                i = pcount            End If        Loop Until pcount = 1000    End WithEnd Sub


Here's a way to approach it using the "look for the spinner element" method discussed in one of the comments, which helps you avoid having to specify the number of elements you're expecting the page to load. The class name of the spinner actually changes depending on whether or not it's visible, which makes it pretty easy to just wait for the spinner to become visible + disappear again before getting the page elements.

This method still involves some waiting; by default, it waits 1/10th of a second after each attempt to find the spinner, either until the spinner is found or for some maximum number of attempts. But that's much faster than waiting 5 seconds every time.

Also, unrelated, but don't write stuff to cells one at a time, it's really slow. It's much faster to write it to an array first + write the entire array at once.

Sub getLinks()    Dim bot As New ChromeDriver    bot.Get "http://fortune.com/fortune500/list/"    Dim posts As WebElements    Dim numPosts As Long    Dim finishedScrolling As Boolean    finishedScrolling = False    Do Until finishedScrolling        'Set beginning post count and scroll down        Dim startPosts As Long        startPosts = numPosts        bot.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);"        'Wait for spinner to become visible, then wait for up to 5 seconds for rehide        Call waitForElements(bot, "div[class^='F500-spinner  ']", 50)        Call waitForElements(bot, "div[class^='F500-spinner hide']", 50)        'See if any new posts have loaded        Set posts = bot.FindElementsByClass("company-title")        numPosts = posts.Count        If numPosts = startPosts Then            finishedScrolling = True        End If    Loop    'Write text to results array    Dim post As WebElement    ReDim resultsArr(1 To posts.Count, 1 To 1) As String    Dim i As Long    i = 1    For Each post In posts        resultsArr(i, 1) = post.Text        i = i + 1    Next    'Write array to sheet    With ActiveSheet        .Range(.Cells(1, 1), .Cells(UBound(resultsArr, 1), 1)).Value = resultsArr    End WithEnd SubSub waitForElements(bot As WebDriver, css As String, maxAttempts As Long, Optional waitTimeMS As Long = 100)'Use a CSS selector string to wait for element(s) to appear on a page or to reach max number of attempts'By default, bot waits 0.1 second after each attempt    Dim i As Long    Dim foundElem As Boolean    foundElem = False    Do Until foundElem        i = i + 1        If bot.FindElementsByCss(css).Count > 0 Then            foundElem = True        ElseIf i = maxAttempts Then            foundElem = True        Else            bot.Wait waitTimeMS        End If    LoopEnd Sub