Convert HTML-table to Excel using VBA
For a client side solution
So run this code after the first block of code, it rewrites the final two columns.
Sub Test2() '* tools references -> '* Microsoft HTML Object Library Dim oHtml4 As MSHTML.IHTMLDocument4 Set oHtml4 = New MSHTML.HTMLDocument Dim oHtml As MSHTML.HTMLDocument Set oHtml = Nothing '* IHTMLDocument4.createDocumentFromUrl '* MSDN - IHTMLDocument4 createDocumentFromUrl method - https://msdn.microsoft.com/en-us/library/aa752523(v=vs.85).aspx Set oHtml = oHtml4.createDocumentFromUrl("https://rasmusrhl.github.io/stuff/", "") While oHtml.readyState <> "complete" DoEvents '* do not comment this out it is required to break into the code if in infinite loop Wend Debug.Assert oHtml.readyState = "complete" Dim oTRs As MSHTML.IHTMLDOMChildrenCollection Set oTRs = oHtml.querySelectorAll("TR") Debug.Assert oTRs.Length = 17 Dim lRowNum As Long For lRowNum = 3 To oTRs.Length - 1 Dim oTRLoop As MSHTML.HTMLTableRow Set oTRLoop = oTRs.Item(lRowNum) If oTRLoop.ChildNodes.Length > 1 Then Debug.Assert oTRLoop.ChildNodes.Length = 14 Dim oSecondToLastColumn As MSHTML.HTMLTableCell Set oSecondToLastColumn = oTRLoop.ChildNodes.Item(12) ActiveSheet.Cells(lRowNum + 2, 13).Value2 = "'" & oSecondToLastColumn.innerText Dim oLastColumn As MSHTML.HTMLTableCell Set oLastColumn = oTRLoop.ChildNodes.Item(13) ActiveSheet.Cells(lRowNum + 2, 14).Value2 = "'" & oLastColumn.innerText End If 'Stop Next lRowNum ActiveSheet.Columns("M:M").EntireColumn.AutoFit ActiveSheet.Columns("N:N").EntireColumn.AutoFitEnd Sub
For a Server Side Solution
Now that we know you control the source script and that it is in R then one can change the R script to style the final columns with mso-number-format:'\@' . Here is a sample R script that achieves this, one builds a CSS matrix of the same dimensions as the data and passes the CSS matrix as a parameter into htmlTable
. I have not tampered with your R source instead I give here a simple illustration for you to interpret.
A=matrix(c("(2)","(4)","(3)","(1)","(5)","(7)"),nrow=2,ncol=3,byrow=TRUE)css_matrix <- matrix(data="",nrow=2,ncol=3)css_matrix[,3] <- "mso-number-format:\"\\@\""htmlTable(x=A,css.cell=css_matrix)
Robin Mackenzie adds
you might mention in your server-side solution that OP just needs to add css_matrix[,10:11] <- "mso-number-format:\"\@\"" to their existing R code (after the last css_matrix... line) and it will implement your solution for their specific problem
Thanks Robin
To get the tabular data (keeping the format as it is) from that page, you can try like below:
Sub Fetch_Data() Dim http As New XMLHTTP60, html As New HTMLDocument Dim posts As Object, post As Object, elem As Object Dim row As Long, col As Long With http .Open "GET", "https://rasmusrhl.github.io/stuff/", False .send html.body.innerHTML = .responseText End With Set posts = html.getElementsByClassName("gmisc_table")(0) For Each post In posts.Rows For Each elem In post.Cells col = col + 1: Cells(row + 1, col).NumberFormat = "@": Cells(row + 1, col) = elem.innerText Next elem col = 0 row = row + 1 Next postEnd Sub
Reference to add to the library:
1. Microsoft HTML Object Library2. Microsoft XML, v6.0 'or whatever version you have
This works with a temp file.
What it does:Downloads Data Locally. Then, replaces the "(" with a "\". Then, imports the data. Formats the data as text (to ensure we can change it back without error). Then, changes the text. This cannot be done with Range.Replace because that will reformat the cell contents.
' Local VariablesPublic FileName As String ' Temp File PathPublic FileUrl As String ' Url Formatted Temp File PathPublic DownloadUrl As String ' Where We're Going to Download From' Declares Have to Be At TopPrivate Declare Function GetTempPath Lib "kernel32" _ Alias "GetTempPathA" _ (ByVal nBufferLength As Long, _ ByVal lpBuffer As String) As LongPrivate Declare Function GetTempFileName Lib "kernel32" _ Alias "GetTempFileNameA" _ (ByVal lpszPath As String, _ ByVal lpPrefixString As String, _ ByVal wUnique As Long, _ ByVal lpTempFileName As String) As Long' Loads the HTML Content Without BugSub ImportHtml() ' Set Our Download URL DownloadUrl = "https://rasmusrhl.github.io/stuff" ' Sets the Temporary File Path SetFilePath ' Downloads the File DownloadFile ' Replaces the "(" in the File With "\(", We Will Later Put it Back ' This Ensures Formatting of Content Isn't Modified!!! ReplaceStringInFile ' Our Query Table is Now Coming From the Local File, Instead Dim s As QueryTable Set s = ActiveSheet.QueryTables.Add(Connection:=("FINDER;file://" + FileUrl), Destination:=Range("$A$1")) With s .Name = "stuff" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = False .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingAll .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = True .WebDisableRedirections = False .Refresh BackgroundQuery:=False ' Sets Formatting So When We Change Text the Data Doesn't Change .ResultRange.NumberFormat = "@" ' Loop Through Cells in Range ' If You Do Excel Replace, Instead It Will Change Cell Format Const myStr As String = "\(", myReplace As String = "(" For Each c In .ResultRange.Cells Do While c.Value Like "*" & myStr & "*" c.Characters(InStr(1, c.Value, myStr), Len(myStr)).Text = myReplace Loop Next End WithEnd Sub' This function replaces the "(" in the file with "\("Sub ReplaceStringInFile() Dim sBuf As String Dim sTemp As String Dim iFileNum As Integer Dim sFileName As String ' Edit as needed sFileName = FileName iFileNum = FreeFile Open sFileName For Input As iFileNum Do Until EOF(iFileNum) Line Input #iFileNum, sBuf sTemp = sTemp & sBuf & vbCrLf Loop Close iFileNum sTemp = Replace(sTemp, "(", "\(") iFileNum = FreeFile Open sFileName For Output As iFileNum Print #iFileNum, sTemp Close iFileNumEnd Sub' This function sets file paths because we need a temp fileFunction SetFilePath() If FileName = "" Then FileName = GetTempHtmlName FileUrl = Replace(FileName, "\", "/") End IfEnd Function' This subroutine downloads the file from the specified URL' The download is necessary because we will be editing the fileSub DownloadFile() Dim myURL As String myURL = "https://rasmusrhl.github.io/stuff" Dim WinHttpReq As Object Set WinHttpReq = CreateObject("Microsoft.XMLHTTP") WinHttpReq.Open "GET", DownloadUrl, False, "username", "password" WinHttpReq.send myURL = WinHttpReq.responseBody If WinHttpReq.Status = 200 Then Set oStream = CreateObject("ADODB.Stream") oStream.Open oStream.Type = 1 oStream.Write WinHttpReq.responseBody oStream.SaveToFile FileName, 2 ' 1 = no overwrite, 2 = overwrite oStream.Close End IfEnd Sub'''''''''''''''''''''''''''''' THIS BLOCK OF CODE GETS A TEMPORARY FILE PATH USING THE GetTempHtmlName Function'''''''''''''''''''''''''''''Public Function GetTempHtmlName( _ Optional sPrefix As String = "VBA", _ Optional sExtensao As String = "") As String Dim sTmpPath As String * 512 Dim sTmpName As String * 576 Dim nRet As Long Dim F As String nRet = GetTempPath(512, sTmpPath) If (nRet > 0 And nRet < 512) Then nRet = GetTempFileName(sTmpPath, sPrefix, 0, sTmpName) If nRet <> 0 Then F = Left$(sTmpName, InStr(sTmpName, vbNullChar) - 1) If sExtensao > "" Then Kill F If Right(F, 4) = ".tmp" Then F = Left(F, Len(F) - 4) F = F & sExtensao End If F = Replace(F, ".tmp", ".html") GetTempHtmlName = F End IfEnd Function'''''''''''''''''''''''''''''' End - GetTempHtmlName'''''''''''''''''''''''''''''