python byte string encode and decode python byte string encode and decode json json

python byte string encode and decode


You need to examine the documentation for the software API that you are using. BLOB is an acronym: BINARY Large Object.

If your data is in fact binary, the idea of decoding it to Unicode is of course a nonsense.

If it is in fact text, you need to know what encoding to use to decode it to Unicode.

Then you use json.dumps(a_Python_object) ... if you encode it to UTF-8 yourself, json will decode it back again:

>>> import json>>> json.dumps(u"\u0100\u0404")'"\\u0100\\u0404"'>>> json.dumps(u"\u0100\u0404".encode('utf8'))'"\\u0100\\u0404"'>>>

UPDATE about latin1:

u'\x80' is a useless meaningless C1 control character -- the encoding is extremely unlikely to be Latin-1. Latin-1 is "a snare and a delusion" -- all 8-bit bytes are decoded to Unicode without raising an exception. Don't confuse "works" and "doesn't raise an exception".


Use b.decode('name of source encoding') to get a unicode version. This was surprising to me when I learned it. eg:

In [123]: 'foo'.decode('latin-1')Out[123]: u'foo'


I think what you are trying to do is decode the string object of some encoding. Do you know what that encoding is? To get the unicode object.

unicode_b = b.decode('some_encoding')

and then re-encoding the unicode object using the utf_8 encoding back to a string object.

b = unicode_b.encode('utf_8')

Using the unicode object as a translator, without knowing what the original encoding of the string is I can't know for certain but there is the possibility that the conversion will not go as expected. The unicode object is not meant for converting strings of one encoding to another. I would work with the unicode object assuming you know what the encoding is, if you don't know what the encoding is then there really isn't a way to find out without trial and error, and then convert back to the encoded string when you want a string object back.