How to print UTF-8 encoded text to the console in Python < 3? How to print UTF-8 encoded text to the console in Python < 3? shell shell

How to print UTF-8 encoded text to the console in Python < 3?


It seems accomplishing this is not recommended.

Fedora suggested using the system locale as the default,but apparently this breaks other things.

Here's a quote from the mailing-list discussion:

The only supported default encodings in Python are: Python 2.x: ASCII Python 3.x: UTF-8If you change these, you are on your own and strange things willstart to happen. The default encoding does not only affectthe translation between Python and the outside world, but alsoall internal conversions between 8-bit strings and Unicode.Hacks like what's happening in the pango module (setting thedefault encoding to 'utf-8' by reloading the site module inorder to get the sys.setdefaultencoding() API back) are justdownright wrong and will cause serious problems since Unicodeobjects cache their default encoded representation.Please don't enable the use of a locale based default encoding.If all you want to achieve is getting the encodings ofstdout and stdin correctly setup for pipes, you shouldinstead change the .encoding attribute of those (only).-- Marc-Andre LemburgeGenix.com


This is how I do it:

#!/usr/bin/python2.7 -Simport syssys.setdefaultencoding("utf-8")import site

Note the -S in the bangline. That tells Python to not automatically import the site module. The site module is what sets the default encoding and the removes the method so it can't be set again. But will honor what is already set.


How to print UTF-8 encoded text to the console in Python < 3?

print u"some unicode text \N{EURO SIGN}"print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')

i.e., if you have a Unicode string then print it directly. If you havea bytestring then convert it to Unicode first.

Your locale settings (LANG, LC_CTYPE) indicate a utf-8 locale andtherefore (in theory) you could print a utf-8 bytestring directly and itshould be displayed correctly in your terminal (if terminal settingsare consistent with the locale settings and they should be) but youshould avoid it: do not hardcode the character encoding of yourenvironment inside your script; print Unicode directly instead.

There are many wrong assumptions in your question.

You do not need to set PYTHONIOENCODING with your locale settings,to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.

You do not need the workaround sys.stdout =codecs.getwriter(locale.getpreferredencoding())(sys.stdout). It maybreak if some code (that you do not control) does need to print bytesand/or it may break whileprinting Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING envvar are enough. Also, if you need to replace sys.stdout then use io.TextIOWrapper() instead of codecs module like win-unicode-console package does.

sys.getdefaultencoding() is unrelated to your locale settings and toPYTHONIOENCODING. Your assumption that setting PYTHONIOENCODINGshould change sys.getdefaultencoding() is incorrect. You shouldcheck sys.stdout.encoding instead.

sys.getdefaultencoding() is not used when you print to theconsole. It may be used as a fallback on Python 2 if stdout isredirected to a file/pipe unless PYTHOHIOENCODING is set:

$ python2 -c'import sys; print(sys.stdout.encoding)'UTF-8$ python2 -c'import sys; print(sys.stdout.encoding)' | catNone$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | catutf8

Do not call sys.setdefaultencoding("UTF-8"); it may corrupt yourdata silently and/or break 3rd-party modules that do not expectit. Remember sys.getdefaultencoding() is used to convert bytestrings(str) to/from unicode in Python 2 implicitly e.g., "a" + u"b". See also,the quote in @mesilliac's answer.