Converting byte string in unicode string

python string unicode python-3.x type-conversion

In strings (or Unicode objects in Python 2), \u has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"\u0432" will result in the character в.

The b'' prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the \u code has no special meaning. Hence, b"\u0432" is just the sequence of the bytes \,u,0,4,3 and 2.

Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.

You can convert this specification using the unicode escape encoder.

>>> c.decode('unicode_escape')'в'

python string unicode python-3.x type-conversion

Loved Lennart's answer. It put me on the right track for solving the particular problem I had faced. What I added was the ability to produce html-compatible code for \u???? specifications in strings. Basically, only one line was needed:

results = results.replace('\\u','&#x')

This all came about from a need to convert JSON results to something that displays well in a browser. Here is some test code that is integrated with a cloud application:

# References:# http://stackoverflow.com/questions/9746303/how-do-i-send-a-post-request-as-a-json# https://docs.python.org/3/library/http.client.html# http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers# http://stackoverflow.com/questions/606191/convert-bytes-to-a-python-string# http://www.w3schools.com/charsets/ref_utf_punctuation.asp# http://stackoverflow.com/questions/13837848/converting-byte-string-in-unicode-stringimport urllib.requestimport jsonbody = [ { "query": "co-development and language.name:English", "page": 1, "pageSize": 100 } ]myurl = "https://core.ac.uk:443/api-v2/articles/search?metadata=true&fulltext=false&citations=false&similar=false&duplicate=false&urls=true&extractedUrls=false&faithfulMetadata=false&apiKey=SZYoqzk0Vx5QiEATgBPw1b842uypeXUv"req = urllib.request.Request(myurl)req.add_header('Content-Type', 'application/json; charset=utf-8')jsondata = json.dumps(body)jsondatabytes = jsondata.encode('utf-8') # needs to be bytesreq.add_header('Content-Length', len(jsondatabytes))print ('\n', jsondatabytes, '\n')response = urllib.request.urlopen(req, jsondatabytes)results = response.read()results = results.decode('utf-8')results = results.replace('\\u','&#x') # produces html hex version of \u???? unicode charactersprint(results)

CodeHunter

Converting byte string in unicode string

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last