How do I unescape HTML entities in a string in Python 3.1? [duplicate] How do I unescape HTML entities in a string in Python 3.1? [duplicate] python python

How do I unescape HTML entities in a string in Python 3.1? [duplicate]


You could use the function html.unescape:

In Python3.4+ (thanks to J.F. Sebastian for the update):

import htmlhtml.unescape('Suzy & John')# 'Suzy & John'html.unescape('"')# '"'

In Python3.3 or older:

import html.parser    html.parser.HTMLParser().unescape('Suzy & John')

In Python2:

import HTMLParserHTMLParser.HTMLParser().unescape('Suzy & John')


You can use xml.sax.saxutils.unescape for this purpose. This module is included in the Python standard library, and is portable between Python 2.x and Python 3.x.

>>> import xml.sax.saxutils as saxutils>>> saxutils.unescape("Suzy & John")'Suzy & John'


Apparently I don't have a high enough reputation to do anything but post this. unutbu's answer does not unescape quotations. The only thing that I found that did was this function:

import refrom htmlentitydefs import name2codepoint as n2cpdef decodeHtmlentities(string):    def substitute_entity(match):                ent = match.group(2)        if match.group(1) == "#":            return unichr(int(ent))        else:            cp = n2cp.get(ent)            if cp:                return unichr(cp)            else:                return match.group()    entity_re = re.compile("&(#?)(\d{1,5}|\w{1,8});")    return entity_re.subn(substitute_entity, string)[0]

Which I got from this page.