Python string to unicode [duplicate]
Unicode escapes only work in unicode strings, so this
a="\u2026"
is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'.
To make unicode out of this, use decode('unicode-escape')
:
a="\u2026"print repr(a)print repr(a.decode('unicode-escape'))## '\\u2026'## u'\u2026'
Decode it with the unicode-escape
codec:
>>> a="Hello\u2026">>> a.decode('unicode-escape')u'Hello\u2026'>>> print _Hello…
This is because for a non-unicode string the \u2026
is not recognised but is instead treated as a literal series of characters (to put it more clearly, 'Hello\\u2026'
). You need to decode the escapes, and the unicode-escape
codec can do that for you.
Note that you can get unicode
to recognise it in the same way by specifying the codec argument:
>>> unicode(a, 'unicode-escape')u'Hello\u2026'
But the a.decode()
way is nicer.
>>> a="Hello\u2026">>> print a.decode('unicode-escape')Hello…