Convert html to plain text in VBA
Set a reference to "Microsoft HTML object library".
Function HtmlToText(sHTML) As String Dim oDoc As HTMLDocument Set oDoc = New HTMLDocument oDoc.body.innerHTML = sHTML HtmlToText = oDoc.body.innerTextEnd Function
Tim
A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html As String) As String Dim text As String Dim accumulating As Boolean Dim n As Integer Dim c As String text = "" accumulating = True n = 1 Do While n <= Len(html) c = Mid(html, n, 1) If c = "<" Then accumulating = False ElseIf c = ">" Then accumulating = True Else If accumulating Then text = text & c End If End If n = n + 1 Loop StripTags = textEnd Function
This can leave lots of extraneous whitespace, but it will help in removing the tags.
Tim's solution was great, worked liked a charm.
I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:
Set ID = ThisWorkbook.VBProject.ReferencesID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5
It worked on Windows XP and Windows 7.