How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫) How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫) tkinter tkinter

How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫)


There is currently no way to display those characters as they are supposed to look in Tkinter in Python 3.4 (although someone mentioned how using surrogate pairs may work [in Python 2.x]). However, you can implement methods to convert the characters into displayable codes and back, and just call them whenever necessary. You have to call them when you print to Text widgets, copy/paste, in file dialogs*, in the tab bar, in the status bar, and other stuff.

*The default Tkinter file dialogs do not allow for much internal engineering of the dialogs. I made my own file dialogs, partly to help with this issue. Let me know if you're interested. Hopefully I'll post the code for them here in the future.

These methods convert out-of-range characters into codes and vice versa. The codes are formatted with ordinal numbers, like this: {119083ū}. The brackets and the ū are just to distinguish this as a code. {119083ū} represents 𝄫. As you can see, I haven’t yet bothered with a way to escape codes, although I did purposefully try to make the codes very unlikely to occur. The same is true for the ᗍ119083ūᗍ used while converting. Anyway, I'm meaning to add escape sequences eventually. These methods are taken from my class (hence the self). (And yes, I know you don’t have to use semi-colons in Python. I just like them and consider that they make the code more readable in some situations.)

import re;def convert65536(self, s):    #Converts a string with out-of-range characters in it into a string with codes in it.    l=list(s);    i=0;    while i<len(l):        o=ord(l[i]);        if o>65535:            l[i]="{"+str(o)+"ū}";        i+=1;    return "".join(l);def parse65536(self, match):    #This is a regular expression method used for substitutions in convert65536back()    text=int(match.group()[1:-2]);    if text>65535:        return chr(text);    else:        return "ᗍ"+str(text)+"ūᗍ";def convert65536back(self, s):    #Converts a string with codes in it into a string with out-of-range characters in it    while re.search(r"{\d\d\d\d\d+ū}", s)!=None:        s=re.sub(r"{\d\d\d\d\d+ū}", self.parse65536, s);    s=re.sub(r"ᗍ(\d\d\d\d\d+)ūᗍ", r"{\1ū}", s);    return s;


My answer is based on @Shule answer but provide more pythnoic and easy to read code. It also provide a real case.

This is the methode populating items to a tkinter.Listbox. There is no back conversion. This solution only take care of displaying strings with Tcl-unallowed characters.

class MyListbox (Listbox):    # ...    def populate(self):        """        """        def _convert65536(to_convert):            """Converts a string with out-of-range characters in it into a            string with codes in it.            Based on <https://stackoverflow.com/a/28076205/4865723>.            This is a workaround because Tkinter (Tcl) doesn't allow unicode            characters outside of a specific range. This could be emoticons            for example.            """            for character in to_convert[:]:                if ord(character) > 65535:                   convert_with = '{' + str(ord(character)) + 'ū}'                   to_convert = to_convert.replace(character, convert_with)            return to_convert        # delete all listbox items        self.delete(0, END)        # add items to listbox        for item in mydata_list:            try:                self.insert(END, item)            except TclError as err:                _log.warning('{} It will be converted.'.format(err))                self.insert(END, _convert65536(item))