Unable to shake off hardcoded delay from my script
Here is a completely different approach that doesn't require using a browser, instead it submits a series of web requests. With this approach, waiting for a page to load isn't a concern.
Typically, with lazy loading pages, it will submit a new request to load up the data for the page as you scroll. If you monitor the web traffic you can spot the requests made and emulate those, I have done that below.
The result should be a list of company names, in ascending order in whatever the first sheet of Excel is.
Things you'll need:
Add References to:
- Microsoft Scripting Runtime
- Microsoft XML v6.0
- Add the VBA-JSON code to your project. You can find that here
Edit
Changed the code to keep pulling data from the site, until there is no more items in the list. Thanks @Qharr for pointing this out.
Code
Public Sub SubmitRequest() Const baseURL As String = "http://fortune.com/api/v2/list/2358051/expand/item/ranking/asc/" Dim Url As String Dim startingNumber As Long Dim j As Long Dim getRequest As MSXML2.XMLHTTP60 Dim Json As Object Dim Companies As Object Dim Company As Variant Dim CompanyArray As Variant 'Create an array to hold each company ReDim CompanyArray(0 To 50000) 'Create a new XMLHTTP object so we can place a get request Set getRequest = New MSXML2.XMLHTTP60 'The api seems to only support returning 100 records at a time 'So do in batches of 100 Do 'Build the url, the format is something like '0/100, where 0 is the starting position, and 100 is the ending position Url = baseURL & startingNumber & "/" & startingNumber + 100 With getRequest .Open "GET", Url .send 'The response is a JSON object, for this code to work - 'You'll need this code https://github.com/VBA-tools/VBA-JSON 'What is returned is a dictionary Set Json = JsonConverter.ParseJson(.responseText) Set Companies = Json("list-items") 'Keep checking in batches of 100 until there are no more If Companies.Count = 0 Then Exit Do 'Iterate the dictionary and return the title (which is the name) For Each Company In Companies CompanyArray(j) = Company("title") j = j + 1 Next End With startingNumber = startingNumber + 100 Loop ReDim Preserve CompanyArray(j - 1) 'Dump the data to the first sheet ThisWorkbook.Sheets(1).Range("A1:A" & j) = WorksheetFunction.Transpose(CompanyArray)End Sub
There you go:
Sub Getlinks() Dim driver As New ChromeDriver Dim pcount As Long, R as long Dim posts As Object, post As Object With driver .get "http://fortune.com/fortune500/list/" Do .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);") Set posts = .FindElementsByClass("company-title") pcount = posts.Count Loop Until pcount = 1000 For Each post In posts R = R + 1: Cells(R, 1) = post.Text Next post End WithEnd Sub
Or even better, print as you go:
Sub Getlinksasyougo() Dim driver As New ChromeDriver Dim pcount As Long, R As Long, i As Long Dim posts As Object, post As Object With driver .get "http://fortune.com/fortune500/list/" i = 1 Do .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);") Set posts = .FindElementsByClass("company-title") pcount = posts.Count If i <> pcount Then For R = i To pcount - 1 Cells(R, 1) = posts(R + 1).Text Next R i = pcount End If Loop Until pcount = 1000 End WithEnd Sub
Here's a way to approach it using the "look for the spinner element" method discussed in one of the comments, which helps you avoid having to specify the number of elements you're expecting the page to load. The class name of the spinner actually changes depending on whether or not it's visible, which makes it pretty easy to just wait for the spinner to become visible + disappear again before getting the page elements.
This method still involves some waiting; by default, it waits 1/10th of a second after each attempt to find the spinner, either until the spinner is found or for some maximum number of attempts. But that's much faster than waiting 5 seconds every time.
Also, unrelated, but don't write stuff to cells one at a time, it's really slow. It's much faster to write it to an array first + write the entire array at once.
Sub getLinks() Dim bot As New ChromeDriver bot.Get "http://fortune.com/fortune500/list/" Dim posts As WebElements Dim numPosts As Long Dim finishedScrolling As Boolean finishedScrolling = False Do Until finishedScrolling 'Set beginning post count and scroll down Dim startPosts As Long startPosts = numPosts bot.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);" 'Wait for spinner to become visible, then wait for up to 5 seconds for rehide Call waitForElements(bot, "div[class^='F500-spinner ']", 50) Call waitForElements(bot, "div[class^='F500-spinner hide']", 50) 'See if any new posts have loaded Set posts = bot.FindElementsByClass("company-title") numPosts = posts.Count If numPosts = startPosts Then finishedScrolling = True End If Loop 'Write text to results array Dim post As WebElement ReDim resultsArr(1 To posts.Count, 1 To 1) As String Dim i As Long i = 1 For Each post In posts resultsArr(i, 1) = post.Text i = i + 1 Next 'Write array to sheet With ActiveSheet .Range(.Cells(1, 1), .Cells(UBound(resultsArr, 1), 1)).Value = resultsArr End WithEnd SubSub waitForElements(bot As WebDriver, css As String, maxAttempts As Long, Optional waitTimeMS As Long = 100)'Use a CSS selector string to wait for element(s) to appear on a page or to reach max number of attempts'By default, bot waits 0.1 second after each attempt Dim i As Long Dim foundElem As Boolean foundElem = False Do Until foundElem i = i + 1 If bot.FindElementsByCss(css).Count > 0 Then foundElem = True ElseIf i = maxAttempts Then foundElem = True Else bot.Wait waitTimeMS End If LoopEnd Sub