Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence python python

Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence


Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

>>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')>>> json_stringb'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'>>> print(json_string.decode())"ברי צקלה"

If you are writing to a file, just use json.dump() and leave it to the file object to encode:

with open('filename', 'w', encoding='utf8') as json_file:    json.dump("ברי צקלה", json_file, ensure_ascii=False)

Caveats for Python 2

For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

with io.open('filename', 'w', encoding='utf8') as json_file:    json.dump(u"ברי צקלה", json_file, ensure_ascii=False)

Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

with io.open('filename', 'w', encoding='utf8') as json_file:    data = json.dumps(u"ברי צקלה", ensure_ascii=False)    # unicode(data) auto-decodes data to unicode if str    json_file.write(unicode(data))

In Python 2, when using byte strings (type str), encoded to UTF-8, make sure to also set the encoding keyword:

>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }>>> d{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')>>> su'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'>>> json.loads(s)['1']u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'>>> json.loads(s)['2']u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'>>> print json.loads(s)['1']ברי צקלה>>> print json.loads(s)['2']ברי צקלה


To write to a file

import codecsimport jsonwith codecs.open('your_file.txt', 'w', encoding='utf-8') as f:    json.dump({"message":"xin chào việt nam"}, f, ensure_ascii=False)

To print to stdout

import jsonprint(json.dumps({"message":"xin chào việt nam"}, ensure_ascii=False))


UPDATE: This is wrong answer, but it's still useful to understand why it's wrong. See comments.

How about unicode-escape?

>>> d = {1: "ברי צקלה", 2: u"ברי צקלה"}>>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')>>> print json_str{"1": "ברי צקלה", "2": "ברי צקלה"}