Use getElementById on HTMLElement instead of HTMLDocument
Sub Scrape() Dim Browser As InternetExplorer Dim Document As htmlDocument Dim Elements As IHTMLElementCollection Dim Element As IHTMLElement Set Browser = New InternetExplorer Browser.Visible = True Browser.navigate "http://www.stackoverflow.com" Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE DoEvents Loop Set Document = Browser.Document Set Elements = Document.getElementById("hmenus").getElementsByTagName("li") For Each Element In Elements Debug.Print Element.innerText 'Questions 'Tags 'Users 'Badges 'Unanswered 'Ask Question Next Element Set Document = Nothing Set Browser = NothingEnd Sub
I don't like it either.
So use javascript:
Public Function GetJavaScriptResult(doc as HTMLDocument, jsString As String) As String Dim el As IHTMLElement Dim nd As HTMLDOMTextNode Set el = doc.createElement("INPUT") Do el.ID = GenerateRandomAlphaString(100) Loop Until Document.getElementById(el.ID) Is Nothing el.Style.display = "none" Set nd = Document.appendChild(el) doc.parentWindow.ExecScript "document.getElementById('" & el.ID & "').value = " & jsString GetJavaScriptResult = Document.getElementById(el.ID).Value Document.removeChild ndEnd FunctionFunction GenerateRandomAlphaString(Length As Long) As String Dim i As Long Dim Result As String Randomize Timer For i = 1 To Length Result = Result & Chr(Int(Rnd(Timer) * 26 + 65 + Round(Rnd(Timer)) * 32)) Next i GenerateRandomAlphaString = ResultEnd Function
Let me know if you have any problems with this; I've changed the context from a method to a function.
By the way, what version of IE are you using? I suspect you're on < IE8. If you upgrade to IE8 I presume it'll update shdocvw.dll to ieframe.dll and you will be able to use document.querySelector/All.
Edit
Comment response which isn't really a comment:Basically the way to do this in VBA is to traverse the child nodes. The problem is you don't get the correct return types. You could fix this by making your own classes that (separately) implement IHTMLElement and IHTMLElementCollection; but that's WAY too much of a pain for me to do it without getting paid :). If you're determined, go and read up on the Implements keyword for VB6/VBA.
Public Function getSubElementsByTagName(el As IHTMLElement, tagname As String) As Collection Dim descendants As New Collection Dim results As New Collection Dim i As Long getDescendants el, descendants For i = 1 To descendants.Count If descendants(i).tagname = tagname Then results.Add descendants(i) End If Next i getSubElementsByTagName = resultsEnd FunctionPublic Function getDescendants(nd As IHTMLElement, ByRef descendants As Collection) Dim i As Long descendants.Add nd For i = 1 To nd.Children.Length getDescendants nd.Children.Item(i), descendants Next iEnd Function
Thanks to dee for the answer above with the Scrape() subroutine. The code worked perfectly as written, and I was able to then convert the code to work with the specific website I am trying to scrape.
I do not have enough reputation to upvote or to comment, but I do actually have some minor improvements to add to dee's answer:
You will need to add the VBA Reference via "Tools\References" to "Microsoft HTML Object Library in order for the code to compile.
I commented out the Browser.Visible line and added the comment as follows
'if you need to debug the browser page, uncomment this line:'Browser.Visible = True
And I added a line to close the browser before Set Browser = Nothing:
Browser.Quit
Thanks again dee!
ETA: this works on machines with IE9, but not machines with IE8. Anyone have a fix?
Found the fix myself, so came back here to post it. The ClassName function is available in IE9. For this to work in IE8, you use querySelectorAll, with a dot preceding the class name of the object you are looking for:
'Set repList = doc.getElementsByClassName("reportList") 'only works in IE9, not in IE8Set repList = doc.querySelectorAll(".reportList") 'this works in IE8+