term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

dscrofts · 2024-03-20T20:54:35Z

Example:

from blessed import Terminal

term = Terminal()
strings = ["123", "456", "🗣️  "]

print("with term.ljust:")
for string in strings:
    print(f"{term.ljust(string, 5)} 1")

print("without term.ljust:")
for string in strings:
    print(f"{string:<5} 1")

Output (term.ljust adds one additional cell):

with term.ljust:
123   1
456   1
🗣️     1
without term.ljust:
123   1
456   1
🗣️    1

However this is not consistent with all unicode sequences. For example, changing strings to ["123", "456", "🤔 "] gives:

Output (term.ljust padding is correct):

with term.ljust:
123   1
456   1
🤔    1
without term.ljust:
123   1
456   1
🤔     1

The text was updated successfully, but these errors were encountered:

jquast · 2024-03-20T21:05:30Z

Hello, thanks for the report.

I was aware of this issue but there was no bug to track it. I could probably add a simple workaround here in blessed so I will try to do that soon.

I recently added support for Variation Selector-16 (U+FE0F) into wcwidth. But the way that blessed uses this library still gets the calculation wrong (adding each individual codepoint together from wcwidth.wcwidth() function).

I might,

add the functionality of interpreting terminal sequences directly into wcwidth library which blessed will directly offload to Should wcwidth provide rjust, ljust, center and textwrap? wcwidth#93
or a "grapheme clustering" functionality to wcwidth that blessed should use
or just make blessed do the "grapheme clustering" necessary to account for these correctly

Correct accounting for Emoji that includes U+FE0F is difficult, only 7 terminals support it at last check, i wrote more about it here https://www.jeffquast.com/post/ucs-detect-test-results/, and I've gotten pushback from libvte author used in terminals like Gnome, they refuse to support it at all https://gitlab.gnome.org/GNOME/vte/-/issues/2580 so i've been a bit distracted just trying to get terminal emulators to support it, rather than having blessed support it, but I will definitely get to it soon.

jquast · 2024-03-20T21:06:44Z

Also to add, I could tell this included U+FE0F by the following commands,

>>> import unicodedata
>>> list(map(unicodedata.name, '🗣️  '))
['SPEAKING HEAD IN SILHOUETTE', 'VARIATION SELECTOR-16', 'SPACE', 'SPACE']
>>> list(map(hex, map(ord, '🗣️  ')))
['0x1f5e3', '0xfe0f', '0x20', '0x20']

jquast · 2024-03-20T21:08:56Z

Also to add, that python's built-in formatting gets this horribly wrong, it's not aware of emojis, terminal sequences, or even basic east-asian characters like Chinese or Japanese, but in your case it just happens to accidentally get it right :)

I wrote an issue about what it might take to get python's built-in formatting to just account for emoji correctly, jquast/wcwidth#94

jquast changed the title ~~term.ljust calculating incorrect padding value with some unicode sequences~~ term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

dscrofts commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

Comments

dscrofts commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024