Convert HTML-table to Excel using VBA Convert HTML-table to Excel using VBA vba vba

Convert HTML-table to Excel using VBA


For a client side solution

So run this code after the first block of code, it rewrites the final two columns.

Sub Test2()    '* tools references ->    '*   Microsoft HTML Object Library    Dim oHtml4 As MSHTML.IHTMLDocument4    Set oHtml4 = New MSHTML.HTMLDocument    Dim oHtml As MSHTML.HTMLDocument    Set oHtml = Nothing    '* IHTMLDocument4.createDocumentFromUrl    '* MSDN - IHTMLDocument4 createDocumentFromUrl method - https://msdn.microsoft.com/en-us/library/aa752523(v=vs.85).aspx    Set oHtml = oHtml4.createDocumentFromUrl("https://rasmusrhl.github.io/stuff/", "")    While oHtml.readyState <> "complete"        DoEvents  '* do not comment this out it is required to break into the code if in infinite loop    Wend    Debug.Assert oHtml.readyState = "complete"    Dim oTRs As MSHTML.IHTMLDOMChildrenCollection    Set oTRs = oHtml.querySelectorAll("TR")    Debug.Assert oTRs.Length = 17    Dim lRowNum As Long    For lRowNum = 3 To oTRs.Length - 1        Dim oTRLoop As MSHTML.HTMLTableRow        Set oTRLoop = oTRs.Item(lRowNum)        If oTRLoop.ChildNodes.Length > 1 Then            Debug.Assert oTRLoop.ChildNodes.Length = 14            Dim oSecondToLastColumn As MSHTML.HTMLTableCell            Set oSecondToLastColumn = oTRLoop.ChildNodes.Item(12)            ActiveSheet.Cells(lRowNum + 2, 13).Value2 = "'" & oSecondToLastColumn.innerText            Dim oLastColumn As MSHTML.HTMLTableCell            Set oLastColumn = oTRLoop.ChildNodes.Item(13)            ActiveSheet.Cells(lRowNum + 2, 14).Value2 = "'" & oLastColumn.innerText        End If        'Stop    Next lRowNum    ActiveSheet.Columns("M:M").EntireColumn.AutoFit    ActiveSheet.Columns("N:N").EntireColumn.AutoFitEnd Sub

For a Server Side Solution

Now that we know you control the source script and that it is in R then one can change the R script to style the final columns with mso-number-format:'\@' . Here is a sample R script that achieves this, one builds a CSS matrix of the same dimensions as the data and passes the CSS matrix as a parameter into htmlTable. I have not tampered with your R source instead I give here a simple illustration for you to interpret.

A=matrix(c("(2)","(4)","(3)","(1)","(5)","(7)"),nrow=2,ncol=3,byrow=TRUE)css_matrix <- matrix(data="",nrow=2,ncol=3)css_matrix[,3] <- "mso-number-format:\"\\@\""htmlTable(x=A,css.cell=css_matrix)

Opening in Excel I get thisenter image description here

Robin Mackenzie adds

you might mention in your server-side solution that OP just needs to add css_matrix[,10:11] <- "mso-number-format:\"\@\"" to their existing R code (after the last css_matrix... line) and it will implement your solution for their specific problem

Thanks Robin


To get the tabular data (keeping the format as it is) from that page, you can try like below:

 Sub Fetch_Data()    Dim http As New XMLHTTP60, html As New HTMLDocument    Dim posts As Object, post As Object, elem As Object    Dim row As Long, col As Long    With http        .Open "GET", "https://rasmusrhl.github.io/stuff/", False        .send        html.body.innerHTML = .responseText    End With    Set posts = html.getElementsByClassName("gmisc_table")(0)    For Each post In posts.Rows        For Each elem In post.Cells            col = col + 1: Cells(row + 1, col).NumberFormat = "@": Cells(row + 1, col) = elem.innerText        Next elem        col = 0        row = row + 1    Next postEnd Sub

Reference to add to the library:

1. Microsoft HTML Object Library2. Microsoft XML, v6.0  'or whatever version you have

This is how that portion looks like when get parsed.enter image description here


This works with a temp file.

What it does:Downloads Data Locally. Then, replaces the "(" with a "\". Then, imports the data. Formats the data as text (to ensure we can change it back without error). Then, changes the text. This cannot be done with Range.Replace because that will reformat the cell contents.

' Local VariablesPublic FileName As String ' Temp File PathPublic FileUrl As String ' Url Formatted Temp File PathPublic DownloadUrl As String ' Where We're Going to Download From' Declares Have to Be At TopPrivate Declare Function GetTempPath Lib "kernel32" _  Alias "GetTempPathA" _  (ByVal nBufferLength As Long, _  ByVal lpBuffer As String) As LongPrivate Declare Function GetTempFileName Lib "kernel32" _  Alias "GetTempFileNameA" _  (ByVal lpszPath As String, _  ByVal lpPrefixString As String, _  ByVal wUnique As Long, _  ByVal lpTempFileName As String) As Long' Loads the HTML Content Without BugSub ImportHtml()    ' Set Our Download URL    DownloadUrl = "https://rasmusrhl.github.io/stuff"    ' Sets the Temporary File Path    SetFilePath    ' Downloads the File    DownloadFile    ' Replaces the "(" in the File With "\(", We Will Later Put it Back    ' This Ensures Formatting of Content Isn't Modified!!!    ReplaceStringInFile    ' Our Query Table is Now Coming From the Local File, Instead    Dim s As QueryTable    Set s = ActiveSheet.QueryTables.Add(Connection:=("FINDER;file://" + FileUrl), Destination:=Range("$A$1"))    With s        .Name = "stuff"        .FieldNames = True        .RowNumbers = False        .FillAdjacentFormulas = False        .PreserveFormatting = False        .RefreshOnFileOpen = False        .BackgroundQuery = True        .RefreshStyle = xlInsertDeleteCells        .SavePassword = False        .SaveData = True        .AdjustColumnWidth = True        .RefreshPeriod = 0        .WebSelectionType = xlEntirePage        .WebFormatting = xlWebFormattingAll        .WebPreFormattedTextToColumns = True        .WebConsecutiveDelimitersAsOne = True        .WebSingleBlockTextImport = False        .WebDisableDateRecognition = True        .WebDisableRedirections = False        .Refresh BackgroundQuery:=False        ' Sets Formatting So When We Change Text the Data Doesn't Change        .ResultRange.NumberFormat = "@"        ' Loop Through Cells in Range        ' If You Do Excel Replace, Instead It Will Change Cell Format        Const myStr As String = "\(", myReplace As String = "("        For Each c In .ResultRange.Cells            Do While c.Value Like "*" & myStr & "*"                c.Characters(InStr(1, c.Value, myStr), Len(myStr)).Text = myReplace            Loop        Next    End WithEnd Sub' This function replaces the "(" in the file with "\("Sub ReplaceStringInFile()    Dim sBuf As String    Dim sTemp As String    Dim iFileNum As Integer    Dim sFileName As String    ' Edit as needed    sFileName = FileName    iFileNum = FreeFile    Open sFileName For Input As iFileNum    Do Until EOF(iFileNum)        Line Input #iFileNum, sBuf        sTemp = sTemp & sBuf & vbCrLf    Loop    Close iFileNum    sTemp = Replace(sTemp, "(", "\(")    iFileNum = FreeFile    Open sFileName For Output As iFileNum    Print #iFileNum, sTemp    Close iFileNumEnd Sub' This function sets file paths because we need a temp fileFunction SetFilePath()    If FileName = "" Then        FileName = GetTempHtmlName        FileUrl = Replace(FileName, "\", "/")    End IfEnd Function' This subroutine downloads the file from the specified URL' The download is necessary because we will be editing the fileSub DownloadFile()    Dim myURL As String    myURL = "https://rasmusrhl.github.io/stuff"    Dim WinHttpReq As Object    Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")    WinHttpReq.Open "GET", DownloadUrl, False, "username", "password"    WinHttpReq.send    myURL = WinHttpReq.responseBody    If WinHttpReq.Status = 200 Then        Set oStream = CreateObject("ADODB.Stream")        oStream.Open        oStream.Type = 1        oStream.Write WinHttpReq.responseBody        oStream.SaveToFile FileName, 2 ' 1 = no overwrite, 2 = overwrite        oStream.Close    End IfEnd Sub'''''''''''''''''''''''''''''' THIS BLOCK OF CODE GETS A TEMPORARY FILE PATH USING THE GetTempHtmlName Function'''''''''''''''''''''''''''''Public Function GetTempHtmlName( _  Optional sPrefix As String = "VBA", _  Optional sExtensao As String = "") As String  Dim sTmpPath As String * 512  Dim sTmpName As String * 576  Dim nRet As Long  Dim F As String  nRet = GetTempPath(512, sTmpPath)  If (nRet > 0 And nRet < 512) Then    nRet = GetTempFileName(sTmpPath, sPrefix, 0, sTmpName)    If nRet <> 0 Then F = Left$(sTmpName, InStr(sTmpName, vbNullChar) - 1)    If sExtensao > "" Then      Kill F      If Right(F, 4) = ".tmp" Then F = Left(F, Len(F) - 4)      F = F & sExtensao    End If    F = Replace(F, ".tmp", ".html")    GetTempHtmlName = F  End IfEnd Function'''''''''''''''''''''''''''''' End - GetTempHtmlName'''''''''''''''''''''''''''''