Usage of unicode() and encode() functions in Python Usage of unicode() and encode() functions in Python sqlite sqlite

Usage of unicode() and encode() functions in Python


str is text representation in bytes, unicode is text representation in characters.

You decode text from bytes to unicode and encode a unicode into bytes with some encoding.

That is:

>>> 'abc'.decode('utf-8')  # str to unicodeu'abc'>>> u'abc'.encode('utf-8') # unicode to str'abc'

UPD Sep 2020: The answer was written when Python 2 was mostly used. In Python 3, str was renamed to bytes, and unicode was renamed to str.

>>> b'abc'.decode('utf-8') # bytes to str'abc'>>> 'abc'.encode('utf-8'). # str to bytesb'abc'


You are using encode("utf-8") incorrectly. Python byte strings (str type) have an encoding, Unicode does not. You can convert a Unicode string to a Python byte string using uni.encode(encoding), and you can convert a byte string to a Unicode string using s.decode(encoding) (or equivalently, unicode(s, encoding)).

If fullFilePath and path are currently a str type, you should figure out how they are encoded. For example, if the current encoding is utf-8, you would use:

path = path.decode('utf-8')fullFilePath = fullFilePath.decode('utf-8')

If this doesn't fix it, the actual issue may be that you are not using a Unicode string in your execute() call, try changing it to the following:

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())


Make sure you've set your locale settings right before running the script from the shell, e.g.

$ locale -a | grep "^en_.\+UTF-8"en_GB.UTF-8en_US.UTF-8$ export LC_ALL=en_GB.UTF-8$ export LANG=en_GB.UTF-8

Docs: man locale, man setlocale.