Python ASCII and Unicode decode error Python ASCII and Unicode decode error sqlite sqlite

Python ASCII and Unicode decode error


You need to take a disciplined approach. Pragmatic Unicode, or How Do I Stop The Pain? has everything you need.

If you get that error on that line of code, then the problem is that string is a byte string, and Python 2 is implicitly trying to decode it to Unicode for you. But it isn't pure ascii. You need to know what the encoding is, and decode it properly.


The encode method should be used on unicode objects to convert them to a str object with a given encoding. The decode method should be used on str objects of a given encoding to convert them unicode objects.

I suppose that your database store strings in UTF-8. So when you get strings from the database, convert them to unicode objects by doing str.decode('utf-8'). Then only use unicode objects in your python program (literals are defined with u'unicode string'). And just before storing them in your database, convert them to str objects with uni.encode('utf-8').


EDIT: As you can see from the downvotes, this is NOT THE BEST WAY TO DO IT. An excellent, and a highly recommended answer is immediately after this, so if you are looking for a good solution, please use that. This is a hackish solution that will not be kind to you at a later point of time.

I feel your pain, I've had a lot of problems with the same error. The simplest way I solved it (and this might not be the best way, and it depends on your application) was to convert things to unicode, and ignore errors. Here's an example from Unicode HOWTO - Python v2.7.3 documentation

>>> unicode('\x80abc', errors='strict')Traceback (most recent call last):  File "<stdin>", line 1, in ?UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:                    ordinal not in range(128)>>> unicode('\x80abc', errors='replace')u'\ufffdabc'>>> unicode('\x80abc', errors='ignore')u'abc'

While this might not be the most expedient method, this is a method that has worked for me.

EDIT:

A couple of people in the comments have mentioned that this is a bad idea, even though the asker accepted the answer. It is NOT a great idea, it will screw things up if you are dealing with european and accented characters. However, this is something you can use if it is NOT production level code, if it is a personal project you are working on, and you need a quick fix to get things rolling. You will eventually need to fix it with the right methods, which are mentioned in the answers below.