Convert html to plain text in VBA Convert html to plain text in VBA vba vba

Convert html to plain text in VBA


Set a reference to "Microsoft HTML object library".

Function HtmlToText(sHTML) As String  Dim oDoc As HTMLDocument  Set oDoc = New HTMLDocument  oDoc.body.innerHTML = sHTML  HtmlToText = oDoc.body.innerTextEnd Function

Tim


A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.

Function StripTags(ByVal html As String) As String    Dim text As String    Dim accumulating As Boolean    Dim n As Integer    Dim c As String    text = ""    accumulating = True    n = 1    Do While n <= Len(html)        c = Mid(html, n, 1)        If c = "<" Then            accumulating = False        ElseIf c = ">" Then            accumulating = True        Else            If accumulating Then                text = text & c            End If        End If        n = n + 1    Loop    StripTags = textEnd Function

This can leave lots of extraneous whitespace, but it will help in removing the tags.


Tim's solution was great, worked liked a charm.

I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:

Set ID = ThisWorkbook.VBProject.ReferencesID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5

It worked on Windows XP and Windows 7.