Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

std/text/unicode width does not return column-width 2 for emojis #457

Open
erf opened this issue Feb 1, 2024 · 5 comments
Open

std/text/unicode width does not return column-width 2 for emojis #457

erf opened this issue Feb 1, 2024 · 5 comments
Labels

Comments

@erf
Copy link

erf commented Feb 1, 2024

I did expect the width function to return 2 for emojis when using the EastAsianWidth.txt file.

  println(width("馃懢"))

this returns 1

Is this method supposed to work similar to the display_width method of ziglyph or similar to this Python wcwidth spesification ? That is to give the rendered column width for modern terminal emulators using the latest Unicode standard?

@TimWhiting
Copy link
Collaborator

TimWhiting commented Feb 1, 2024

This is what the standard library documentation says:

// Return the column-width of a unicode character.
// Equivalent to ``wcwidth``
pub fun char/width( c : char ) : int {
  if (zero-widths.force.contains(c.int)) then 0
  elif (asian-wide.force.contains(c.int)) then 2
  else 1
}

// Return the total column-width of a string.
pub fun string/width( s : string ) : int {
  var total := 0
  s.foreach( fn(c) {
    total := total + c.width
  })
  total
}

So yes, I believe the intent is for terminal emulators as in the python wcwidth spec, however I'm not certain if it is currently up to date (I'm not sure when Daan last updated the asian-wide list).

Also I would expect the following to print two utf16 characters, but it only does one utf32 character. I guess I'm less certain on the intended underlying representation for characters. I'll have to ask Daan.
"馃懢".slice.foreach(fn(c) c.println)

@erf
Copy link
Author

erf commented Feb 4, 2024

I'll just link this article here. It's a good read with some valuable links

https://mitchellh.com/writing/grapheme-clusters-in-terminals

@TimWhiting
Copy link
Collaborator

Thanks for the link!

@TimWhiting
Copy link
Collaborator

TimWhiting commented Feb 4, 2024

Great post. I'll have to look at the algorithm he references to improve Koka's clustering

@erf
Copy link
Author

erf commented Feb 4, 2024

Yeah i'm a beta tester on the Ghostty terminal (it's great!), and they have implemented Mode 2027 for proper Unicode handling

@TimWhiting TimWhiting added the bug label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants