How to encode Chinese character as 'gbk' in json, to format a url request parameter String? How to encode Chinese character as 'gbk' in json, to format a url request parameter String? json json

How to encode Chinese character as 'gbk' in json, to format a url request parameter String?


You almost got it with ensure_ascii=False. This works:

jsonStr = json.dumps(d, encoding='gbk', ensure_ascii=False).encode('gbk')

You need to tell json.dumps() that the strings it will read are GBK, and that it should not try to ASCII-fy them. Then you must re-specify the output encoding, because json.dumps() has no separate option for that.

This solution is similar to another answer here: https://stackoverflow.com/a/18337754/4323

So this does what you want, though I should note that the standard for URIs seems to say that they should be in UTF-8 whenever possible. For more on this, see here: https://stackoverflow.com/a/14001296/4323


"key":"上海",

You saved your source code as UTF-8, so this is the byte string '\xe4\xb8\x8a\xe6\xb5\xb7'.

jsonStr = json.dumps(d,encoding='gbk')

The JSON format supports only Unicode strings. The encoding parameter can be used to force json.dumps into allowing byte strings, automatically decoding them to Unicode using the given encoding.

However, the byte string's encoding is actually UTF-8 not 'gbk', so json.dumps decodes incorrectly, giving u'涓婃捣'. It then produces the incorrect JSON output "\u6d93\u5a43\u6363", which gets URL-encoded to %22%5Cu6d93%5Cu5a43%5Cu6363%22.

To fix this you should make the input to json.dumps a proper Unicode (u'') string:

# coding: utf-8d = {    "key": u"上海",  # or u'\u4e0a\u6d77' if you don't want to rely on the coding decl    "num":1}jsonStr = json.dumps(d)...

This will get you JSON "\u4e0a\u6d77", encoding to URL %22%5Cu4e0a%5Cu6d77%22.

If you really don't want the \u escapes in your JSON you can indeed ensure_ascii=False and then .encode() the output before URL-encoding. But I wouldn't recommend it as you would then have to worry about what encoding the target application wants in its URL parameters, which is a source of some pain. The \u version is accepted by all JSON parsers, and is not typically much longer once URL-encoded.