How to convert a string to utf-8 in Python How to convert a string to utf-8 in Python python python

How to convert a string to utf-8 in Python


In Python 2

>>> plain_string = "Hi!">>> unicode_string = u"Hi!">>> type(plain_string), type(unicode_string)(<type 'str'>, <type 'unicode'>)

^ This is the difference between a byte string (plain_string) and a unicode string.

>>> s = "Hello!">>> u = unicode(s, "utf-8")

^ Converting to unicode and specifying the encoding.

In Python 3

All strings are unicode. The unicode function does not exist anymore. See answer from @Noumenon


If the methods above don't work, you can also tell Python to ignore portions of a string that it can't convert to utf-8:

stringnamehere.decode('utf-8', 'ignore')


Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:

def make_unicode(inp):    if type(inp) != unicode:        inp =  inp.decode('utf-8')    return inp