Flask - headers are not converted to unicode?
At http://flask.pocoo.org/docs/api/#flask.request we read
The request object is an instance of a
Request
subclass and provides all of the attributes Werkzeug defines.
The word Request
links to http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.Request where we read
The
Request
andResponse
classes subclass theBaseRequest
andBaseResponse
classes and implement all the mixins Werkzeug provides:
The word BaseRequest
links to http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.BaseRequest where we read
headers
The headers from the WSGI environ as immutableEnvironHeaders
.
The word EnvironHeaders
links to http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.EnvironHeaders where we read
This provides the same interface as Headers and is constructed from a WSGI environment.
The word Headers is... no, it's not linked but it should has been linked to http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.Headers where we read
Headers is mostly compatible with the Python
wsgiref.headers.Headers
class
where the phrase wsgiref.headers.Headers
links to http://docs.python.org/dev/library/wsgiref.html#wsgiref.headers.Headers where we read
Create a mapping-like object wrapping headers, which must be a list of header name/value tuples as described in
PEP 3333
.
The phrase PEP 3333
links to http://www.python.org/dev/peps/pep-3333/ where there's no explicit definition of what type headers should be but after searching for word headers for a while we find this statement
WSGI therefore defines two kinds of "string":
"Native" strings (which are always implemented using the type named str)that are used for request/response headers and metadata"Bytestrings" (which are implemented using the `bytes` type in Python 3,and `str` elsewhere), that are used for the bodies of requests andresponses (e.g. POST/PUT input data and HTML page outputs).
That's why in Python 2 you get headers as str
not unicode
.
Now let's move to decoding.
Neither your .decode('utf-8')
nor mensi's .decode('ascii')
(nor blindly expecting any other encoding) is universally good because In theory, HTTP header field values can transport anything; the tricky part is to get all parties (sender, receiver, and intermediates) to agree on the encoding.. Having said that I think you should act according to Julian Reshke's advice
Thus, the safe way to do this is to stick to ASCII, and choose an encoding on top of that, such as the one defined in RFC 5987.
after checking that User Agents (browsers) you support have implemented it.
Title of RFC 5987 is Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters