Setting the correct encoding when piping stdout in Python Setting the correct encoding when piping stdout in Python python python

Setting the correct encoding when piping stdout in Python

First, regarding this solution:

# -*- coding: utf-8 -*-print u"åäö".encode('utf-8')

It's not practical to explicitly print with a given encoding every time. That would be repetitive and error-prone.

A better solution is to change sys.stdout at the start of your program, to encode with a selected encoding. Here is one solution I found on Python: How is sys.stdout.encoding chosen?, in particular a comment by "toka":

import sysimport codecssys.stdout = codecs.getwriter('utf8')(sys.stdout)

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*-print u"åäö".encode('utf-8')

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sysfor line in sys.stdin:    # Decode what you receive:    line = line.decode('iso8859-1')    # Work with Unicode internally:    line = line.upper()    # Encode what you send:    line = line.encode('utf-8')    sys.stdout.write(line)

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8". I have written a page on my ordeal with this problem.

Tl;dr of the blog post:

import sys, locale, osprint(sys.stdout.encoding)print(sys.stdout.isatty())print(locale.getpreferredencoding())print(sys.getfilesystemencoding())print(os.environ["PYTHONIOENCODING"])print(chr(246), chr(9786), chr(9787))

gives you

utf_8FalseANSI_X3.4-1968asciiutf_8ö ☺ ☻