Scraper throws errors instead of quitting the browser when everything is done Scraper throws errors instead of quitting the browser when everything is done vba vba

Scraper throws errors instead of quitting the browser when everything is done


Ok, so there is something seriously unfriendly about that webpage. It kept crashing for me. So I have resorted to running a javascript program within scripting engine/scripting control and it works.

I hope you can follow it. The logic is in the javascript added to the ScriptEngine. I get two lists of nodes, one list of films and one list of years; then I step through each array in sync and add them as key value pair to a Microsoft Scripting Dictionary.

Option Explicit'*Tools->References'*    Microsoft Scripting Runtime'*    Microsoft Scripting Control'*    Microsoft Internet Controls'*    Microsoft HTML Object LibrarySub Torrent_Data()    Dim row As Long    Dim IE As New InternetExplorer, html As HTMLDocument    Dim post As Object    With IE        .Visible = True        .navigate "https://yts.am/browse-movies"        Do While .readyState <> READYSTATE_COMPLETE:            DoEvents        Loop        Set html = .document    End With    Dim dicFilms As Scripting.Dictionary    Set dicFilms = New Scripting.Dictionary    Call GetScriptEngine.Run("getMovies", html, dicFilms)    Dim vFilms As Variant    vFilms = dicFilms.Keys    Dim vYears As Variant    vYears = dicFilms.Items    Dim lRowLoop As Long    For lRowLoop = 0 To dicFilms.Count - 1        Cells(lRowLoop + 1, 1) = vFilms(lRowLoop)        Cells(lRowLoop + 1, 2) = vYears(lRowLoop)    Next lRowLoop    Stop    IE.QuitEnd SubPrivate Function GetScriptEngine() As ScriptControl    '* see code from this SO Q & A    ' https://stackoverflow.com/questions/37711073/in-excel-vba-on-windows-how-to-get-stringified-json-respresentation-instead-of    Static soScriptEngine As ScriptControl    If soScriptEngine Is Nothing Then        Set soScriptEngine = New ScriptControl        soScriptEngine.Language = "JScript"        soScriptEngine.AddCode "function getMovies(htmlDocument, microsoftDict) { " & _                                    "var titles = htmlDocument.querySelectorAll('a.browse-movie-title'), i;" & _                                    "var years = htmlDocument.querySelectorAll('div.browse-movie-year'), j;" & _                                    "if ( years.length === years.length) {" & _                                    "for (i=0; i< years.length; ++i) {" & _                                    "   var film = titles[i].innerText;" & _                                    "   var year = years[i].innerText;" & _                                    "   microsoftDict.Add(film, year);" & _                                    "}}}"    End If    Set GetScriptEngine = soScriptEngineEnd Function


The website has an API. Check e. g. result from the URL https://yts.am/api/v2/list_movies.json?page=1&limit=50, which actually represents 50 movies from first page of latest movies category, in JSON format.

Take a look at the below example. Import JSON.bas module into the VBA project for JSON processing.

Option ExplicitSub Test()    Dim sJSONString As String    Dim vJSON    Dim sState As String    Dim lPage As Long    Dim aRes()    Dim i As Long    Dim aData()    Dim aHeader()    With Sheets(1)        .Cells.Delete        .Cells.WrapText = False    End With    lPage = 1    aRes = Array()    Do        With CreateObject("MSXML2.XMLHTTP")            .Open "GET", "https://yts.am/api/v2/list_movies.json?page=" & lPage & "&limit=50", False            .send            sJSONString = .responseText        End With        JSON.Parse sJSONString, vJSON, sState        If Not vJSON("data").Exists("movies") Then Exit Do        vJSON = vJSON("data")("movies")        ReDim Preserve aRes(UBound(aRes) + UBound(vJSON) + 1)        For i = 0 To UBound(vJSON)            Set aRes(UBound(aRes) - UBound(vJSON) + i) = vJSON(i)        Next        lPage = lPage + 1        Debug.Print "Parsed " & (UBound(aRes) + 1)        DoEvents    Loop    JSON.ToArray aRes, aData, aHeader    With Sheets(1)        OutputArray .Cells(1, 1), aHeader        Output2DArray .Cells(2, 1), aData        .Columns.AutoFit    End With    MsgBox "Completed"End SubSub OutputArray(oDstRng As Range, aCells As Variant)    With oDstRng        .Parent.Select        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)            .NumberFormat = "@"            .Value = aCells        End With    End WithEnd SubSub Output2DArray(oDstRng As Range, aCells As Variant)    With oDstRng        .Parent.Select        With .Resize( _                UBound(aCells, 1) - LBound(aCells, 1) + 1, _                UBound(aCells, 2) - LBound(aCells, 2) + 1)            .NumberFormat = "@"            .Value = aCells        End With    End WithEnd Sub

The output for me as follows, at the moment there are 7182 movies total:

output

BTW, the similar approach applied in other answers.