Tkinter and 32-bit Unicode duplicating – any fix? Tkinter and 32-bit Unicode duplicating – any fix? tkinter tkinter

Tkinter and 32-bit Unicode duplicating – any fix?


The fundamental problem is that Tcl and Tk are not very happy with non-BMP (Unicode Basic Multilingual Plane) characters. Prior to 8.6.10, what happens is anyone's guess; the implementation simply assumed such characters didn't exist and was known to be buggy when they actually turned up (there's several tickets on various aspects of this). 8.7 will have stronger fixes in place (see TIP #389 for the details) — the basic aim is that if you feed non-BMP characters in, they can be got out at the other side so they can be written to a UTF-8 file or displayed by Tk if the font engine deigns to support them — but some operations will still be wrong as the string implementation will still be using surrogates. 9.0 will fix things properly (by changing the fundamental character storage unit to be large enough to accommodate any Unicode codepoint) but that's a disruptive change.

With released versions, if you can get the surrogates over the wall from Python to Tcl, they'll probably end up in the GUI engine which might do the right thing. In some cases (not including any build I've currently got, FWIW, but I've got strange builds so don't read very much into that). With 8.7, sending over UTF-8 will be able to work; that's part of the functionality profile that will be guaranteed. (The encoding functions exist in older versions, but with 8.6 releases they will do the wrong thing with non-BMP UTF-8 and break weirdly with older versions than that.)


The problem

Several things could have happened:
  • That is what the emoji is. There is no way to fix it, except change the source emoji.
  • Tk and/or Tcl are confused with the emoji. This means that it isn't sure what emoji to put, so it puts 2 chipmunks. When I tried that emoji on my Linux computer, it threw an error.

The solution

The only solution may be to save the emoji as a file, then create an image. But there could be other, slightly more complicated ways. For example, you could create a rectangle of Frame over the second chipmunk to hide it.


As you pointed out, your code works as is on Windows (tested on Windows 10), however for macOS, the following workaround should work:

  1. Convert the encoding of the Emoji from UTF-32 to UTF-16 (No loss of functionality occurs since UTF-16 is a variable length encoding, hence any code point that can be represented in UTF-32 can be converted to UTF-16 only in these case where modern Emojis are involved, the UTF-16 encoded value will use 32 bits, same as UTF-32, meaning it should support Unicode v11 character representation).
  2. Pass the resulting string to the embedded Tcl/Tk interpreter.

UTF-16 Programming with Unicode

In UTF-16, characters in ranges U+0000—U+D7FF and U+E000—U+FFFD are stored as a single 16 >bits unit. Non-BMP characters (range U+10000—U+10FFFF) are stored as “surrogate pairs”, >two 16 bits units: an high surrogate (in range U+D800—U+DBFF) followed by a low surrogate (in range U+DC00—U+DFFF).

For Tcl to perform the substitution of a unicode-escaped string (with its character/emoji representation), the string itself must be of the form "\uXXXX" or "\uXXXX\uXXXX".

The chipmunk Emoji's encoding must be converted to UTF-16 => "\ud83d\udc3f"

    # The tcl/tk code    set chipmunk "\ud83d\udc3f"        pack [set c [canvas .c -highlightcolor blue -highlightbackground black -background yellow]] -padx 4cm -pady 4cm -expand 1 -fill both        set text_id [$c create text 0 0 -text $chipmunk -font [list * 180]]        $c moveto $text_id 0 0

Unicode chipmunk in Tcl/Tk

The equivalent code in python, will have at some point to bypass tkinter and issue direct tcl commands to the embedded/linked interpreter

import tkinter as tk# the top-level windowtop = tk.Tk()# the canvasc = tk.Canvas(top, highlightcolor = 'blue', highlightbackground = 'black', background = 'yellow')# create the text item, with placeholder texttext_id = c.create_text(0,0, font = '* 180', text = 'to be replaced')# pack itc.pack(side = 'top', fill = 'both' , expand = 1, padx = '4c' , pady = '4c')# The 'Bypassing' aka issuing tcl/tk calls directly# For Tk calls use => c.tk.cal(...), we will not use this.# For bare Tcl => c.tk.eval(...)# chipmunk in UTF-16 (in this instance it is using 32-bits to represent the codepoint)# as a raw stringchipmunk = r"\ud83d\udc3f"# create another variable in tcl/tkc.tk.eval('set the_tcl_chipmunk {}'.format(chipmunk))# set the text_id item's -text property/option as the value of variable the_tcl_chipmunk, gotten by calling the tcl's set commandc.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) )# Apparently a hack to get the chipmunk in positionc.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) )# the main gui event looptop.mainloop()

Unicode chipmunk in python

Getting the UTF-16 of chipmunk

There are two avenues you could pursue:

  1. Getting it from a website, I use fileformat.info all the time chipmunk on fileformat.info and copy value shown for C/C++/Java source code

  2. Doing the conversion from UTF-32 to UTF-16 in Python

# A UTF-32 string, since it's of the form "\UXXXX_XXXX" ( _ is not part of the syntax, a mere visual aide fo illustrative purposes)chipmunk_utf_32 = '\U0001F43F'# convert/encode it to UTF-16 (big endiann), to get a bytes objectchipmunk_utf_16 = chipmunk_utf_32.encode('utf-16-be')# obtain the hex representationchipmunk_utf_16 = chipmunk_utf_16.hex()#format it to be an escaped UTF-16 tcl stringchipmunk = '\\u{}\\u{}'.format(chipmunk_utf_16[0:4], chipmunk_utf_16[4:8])

EDIT: The whole script

import tkinter as tk# A UTF-32 string, since it's of the form "\UXXXX_XXXX" ( _ is not part of the syntax, a mere visual aide fo illustrative purposes)chipmunk_utf_32 = '\U0001F43F'# convert/encode it to UTF-16 (big endiann), to get a bytes objectchipmunk_utf_16 = chipmunk_utf_32.encode('utf-16-be')# obtain the hex representationchipmunk_utf_16 = chipmunk_utf_16.hex()#format it to be an escaped UTF-16 tcl stringchipmunk = '\\u{}\\u{}'.format(chipmunk_utf_16[0:4], chipmunk_utf_16[4:8])# the top-level windowtop = tk.Tk()# the canvasc = tk.Canvas(top, highlightcolor = 'blue', highlightbackground = 'black', background = 'yellow')# create the text item, with placeholder texttext_id = c.create_text(0,0, font = '* 180', text = 'to be replaced')# pack itc.pack(side = 'top', fill = 'both' , expand = 1, padx = '4c' , pady = '4c')# The 'Bypassing' aka issuing tcl/tk calls directly# For Tk calls use => c.tk.cal(...), we will not use this.# For bare Tcl => c.tk.eval(...)# chipmunk in UTF-16 (in this instance it is using 32-bits to represent the codepoint)# as a raw string#print(chipmunk)#chipmunk = r"\ud83d\udc3f"# create another variable in tcl/tkc.tk.eval('set the_tcl_chipmunk {}'.format(chipmunk))# set the text_id item's -text property/option as the value of variable the_tcl_chipmunk, gotten by calling the tcl's set commandc.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) )# Apparently a hack to get the chipmunk in positionc.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) )top.mainloop()