Python-3 and \x Vs \u Vs \U in string encoding and why

python python-3.x unicode python-unicode unicode-string

Python gives you a representation of the string, and for non-printable characters will use the shortest available escape sequence.

\x80 is the same character as \u0080 or \U00000080, but \x80 is just shorter. For chr(57344) the shortest notation is \ue000, you can't express the same character with \xhh, that notation only can be used for characters up to \0xFF.

For some characters there are even single-letter escapes, like \n for a newline, or \t for a tab.

Python has multiple notation options for historical and practical reasons. In a byte string you can only create bytes in the range 0 - 255, so there \xhh is helpful and more concise than having to use \U000hhhhh everywhere when you can't even use the full range available to that notation, and \xhh and \n and related codes are familiar to programmers from other languages.

CodeHunter

Python-3 and \x Vs \u Vs \U in string encoding and why

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last