How to encode UTF8 filename for HTTP headers? (Python, Django) How to encode UTF8 filename for HTTP headers? (Python, Django) python python

How to encode UTF8 filename for HTTP headers? (Python, Django)


This is a FAQ.

There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).

See test cases at http://greenbytes.de/tech/tc2231/.

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).


Don't send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.

(*: actually, there's not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)


Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.