python byte string encode and decode

python json unicode utf-8 python-unicode

You need to examine the documentation for the software API that you are using. BLOB is an acronym: BINARY Large Object.

If your data is in fact binary, the idea of decoding it to Unicode is of course a nonsense.

If it is in fact text, you need to know what encoding to use to decode it to Unicode.

Then you use json.dumps(a_Python_object) ... if you encode it to UTF-8 yourself, json will decode it back again:

>>> import json>>> json.dumps(u"\u0100\u0404")'"\\u0100\\u0404"'>>> json.dumps(u"\u0100\u0404".encode('utf8'))'"\\u0100\\u0404"'>>>

UPDATE about latin1:

u'\x80' is a useless meaningless C1 control character -- the encoding is extremely unlikely to be Latin-1. Latin-1 is "a snare and a delusion" -- all 8-bit bytes are decoded to Unicode without raising an exception. Don't confuse "works" and "doesn't raise an exception".

python json unicode utf-8 python-unicode

Use b.decode('name of source encoding') to get a unicode version. This was surprising to me when I learned it. eg:

In [123]: 'foo'.decode('latin-1')Out[123]: u'foo'

python json unicode utf-8 python-unicode

I think what you are trying to do is decode the string object of some encoding. Do you know what that encoding is? To get the unicode object.

unicode_b = b.decode('some_encoding')

and then re-encoding the unicode object using the utf_8 encoding back to a string object.

b = unicode_b.encode('utf_8')

Using the unicode object as a translator, without knowing what the original encoding of the string is I can't know for certain but there is the possibility that the conversion will not go as expected. The unicode object is not meant for converting strings of one encoding to another. I would work with the unicode object assuming you know what the encoding is, if you don't know what the encoding is then there really isn't a way to find out without trial and error, and then convert back to the encoded string when you want a string object back.

CodeHunter

python byte string encode and decode

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last