Python - 'ascii' codec can't decode byte

python python-2.7 unicode python-2.x python-unicode

"你好".encode('utf-8')

encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don't have the u). So python has to convert the string to a unicode object first. So it does the equivalent of

"你好".decode().encode('utf-8')

But the decode fails because the string isn't valid ascii. That's why you get a complaint about not being able to decode.

python python-2.7 unicode python-2.x python-unicode

Always encode from unicode to bytes.
In this direction, you get to choose the encoding.

>>> u"你好".encode("utf8")'\xe4\xbd\xa0\xe5\xa5\xbd'>>> print _你好

The other way is to decode from bytes to unicode.
In this direction, you have to know what the encoding is.

>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'>>> print bytes你好>>> bytes.decode('utf-8')u'\u4f60\u597d'>>> print _你好

This point can't be stressed enough. If you want to avoid playing unicode "whack-a-mole", it's important to understand what's happening at the data level. Here it is explained another way:

A unicode object is decoded already, you never want to call decode on it.
A bytestring object is encoded already, you never want to call encode on it.

Now, on seeing .encode on a byte string, Python 2 first tries to implicitly convert it to text (a unicode object). Similarly, on seeing .decode on a unicode string, Python 2 implicitly tries to convert it to bytes (a str object).

These implicit conversions are why you can get UnicodeDecodeError when you've called encode. It's because encoding usually accepts a parameter of type unicode; when receiving a str parameter, there's an implicit decoding into an object of type unicode before re-encoding it with another encoding. This conversion chooses a default 'ascii' decoder^†, giving you the decoding error inside an encoder.

In fact, in Python 3 the methods str.decode and bytes.encode don't even exist. Their removal was a [controversial] attempt to avoid this common confusion.

^† _{...or whatever coding sys.getdefaultencoding() mentions; usually this is 'ascii'}

python python-2.7 unicode python-2.x python-unicode

You can try this

import sysreload(sys)sys.setdefaultencoding("utf-8")

You can also try following

Add following line at top of your .py file.

# -*- coding: utf-8 -*-

CodeHunter

Python - 'ascii' codec can't decode byte

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last