Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese characters are chopped #227

Open
migueldeicaza opened this issue Apr 17, 2022 · 8 comments
Open

Chinese characters are chopped #227

migueldeicaza opened this issue Apr 17, 2022 · 8 comments

Comments

@migueldeicaza
Copy link
Owner

migueldeicaza commented Apr 17, 2022

Given a chinese character that uses more than one cell, and another one, we get two CTRuns, one for the Chinese character, and one that contains both a space and the second character. The space when we render seems to be overwriting the second half of the Chinese character:

image

The patch below extracts the position of the rune relative to the line, and helps a bit, but now the last character of the line is chopped, so there is some clipping somewhere I need to find, this is what it looks like now:

image

diff --git a/Sources/SwiftTerm/Apple/AppleTerminalView.swift b/Sources/SwiftTerm/Apple/AppleTerminalView.swift
index 59f8643..4ba20e0 100644
--- a/Sources/SwiftTerm/Apple/AppleTerminalView.swift
+++ b/Sources/SwiftTerm/Apple/AppleTerminalView.swift
@@ -564,9 +564,13 @@ extension TerminalView {
                     CTRunGetGlyphs(run, CFRange(), bufferPointer.baseAddress!)
                     count = runGlyphsCount
                 }
+                let x = [CGPoint](unsafeUninitializedCapacity: runGlyphsCount) { (bufferPointer, count) in
+                    CTRunGetPositions(run, CFRange(), bufferPointer.baseAddress!)
+                    count = runGlyphsCount
+                }
 
                 var positions = runGlyphs.enumerated().map { (i: Int, glyph: CGGlyph) -> CGPoint in
-                    CGPoint(x: lineOrigin.x + (cellDimension.width * CGFloat(col + i)), y: lineOrigin.y + ceil(lineLeading + lineDescent))
+                    CGPoint(x: lineOrigin.x + x [i].x /* (cellDimension.width * CGFloat(col + i))*/, y: lineOrigin.y + ceil(lineLeading + lineDescent))
                 }
 
                 var backgroundColor: TTColor?
@JoeMatt
Copy link

JoeMatt commented Apr 19, 2022

I think you need to convert the whole string to a width, not just the glyphs, especially in Unicode which can have reverse text, subscripts etc.

https://developer.apple.com/documentation/foundation/nsstring/1524729-boundingrectwithsize

@andy380743909
Copy link

I think you need to convert the whole string to a width, not just the glyphs, especially in Unicode which can have reverse text, subscripts etc.

https://developer.apple.com/documentation/foundation/nsstring/1524729-boundingrectwithsize

I think it's not the string width but glyph postions in line are wrong. You just point out a right direction of this issue, because the CoreText calculate each character's position not just using the char width multiply the index of it in a line.
I will try to fix this issue, good luck to me.

@migueldeicaza
Copy link
Owner Author

The patch above sort of works, but there are some additional tasks that are required involving the cursor rendering - otherwise what the cursor renders when it is blinking and the text in the terminals are out of sync.

And we have an additional problem with Emoji

Emoji Problem

If I have this text on the first column:

🍂hello

With Emacs, the contents of the buffer are:

    (lldb) p buffer.lines [vy][0].getCharacter()
    (Character) "🍂"
    (lldb) p buffer.lines [vy][1].getCharacter()
    (Character) "h"
    (lldb) p buffer.lines [vy][2].getCharacter()
    (Character) "e"

For vi, instead, I get this:

    (lldb) p buffer.lines [vy][0].getCharacter()
    (Character) "🍂"
    (lldb) p buffer.lines [vy][1].getCharacter()
    (Character) "\0"
    (lldb) p buffer.lines [vy][2].getCharacter()
    (Character) "h"

The way this happens is that vi prints the emoji first at columb, then manually sets the cursor location to column 3, and prints "hello". This is why we have a 0 in the buffer at position [1].
Emacs on the other hand, will output the string in one go, and relies on the terminal to place the character in the right place.

With Chinese characters, we have a different situation, both vi and emacs output the character directly, without doing the cursor positioning trick, so this does not pose a problem.

The challenge seems to be that we might need to detect an Emoji, and in that case, if the next column contains a zero, not insert a space

The problem with the vi case is that when the offsets are computed, the emoji is at position 0, then the space corresponding to the 0 value is given the slot in column 3, rather than column 1. So we produce this:

Instead of:

Cursor Rendering

  1. Will need to set the cursor size based on the glyph size
  2. Will need to also compute the cursor location based on the runs before the one under the cursor.

There are a few scenarios to handle.

Bad news for vi

When we query the runs, we get two runs

  1. The emoji
  2. The string " hello", notice the space at the beginning.

106 glyphs for hello and the rest of the empty string
Glyph 0 is code 3
Glyph 1.. is code 570, 534, 605, 605, 629, 485, 3, 3, 3

So the “3” is a space, the good news is that it does not render anything, but we start rendering it at the wrong position, because we have a nil there.

This new patch handles the vi special scenario, and would need to also be replicated for cursors:

https://gist.github.com/migueldeicaza/b3451fb7880def299112e5e362c94323

@migueldeicaza
Copy link
Owner Author

Perhaps an alternative is to allocate cells when the characters are generated. So when double-width characters like Chinese characters or emoji are produced, I record both the character and an empty slot after, and use this information while rendering, so I can normalize all output into a grid.

And perhaps rather than trying to create a single attributed string for the whole line, I need to break up the attributed string at these width-changing boundaries

@migueldeicaza
Copy link
Owner Author

Some of the suggestion above already existed in the codebase, and I should addressed a few scenarios for characters that did not have the right metrics. There might be more places where I need to add this information.

@lampmanyao
Copy link
Contributor

Given a chinese character that uses more than one cell, and another one, we get two CTRuns, one for the Chinese character, and one that contains both a space and the second character. The space when we render seems to be overwriting the second half of the Chinese character:

image

The patch below extracts the position of the rune relative to the line, and helps a bit, but now the last character of the line is chopped, so there is some clipping somewhere I need to find, this is what it looks like now:

image
diff --git a/Sources/SwiftTerm/Apple/AppleTerminalView.swift b/Sources/SwiftTerm/Apple/AppleTerminalView.swift
index 59f8643..4ba20e0 100644
--- a/Sources/SwiftTerm/Apple/AppleTerminalView.swift
+++ b/Sources/SwiftTerm/Apple/AppleTerminalView.swift
@@ -564,9 +564,13 @@ extension TerminalView {
                     CTRunGetGlyphs(run, CFRange(), bufferPointer.baseAddress!)
                     count = runGlyphsCount
                 }
+                let x = [CGPoint](unsafeUninitializedCapacity: runGlyphsCount) { (bufferPointer, count) in
+                    CTRunGetPositions(run, CFRange(), bufferPointer.baseAddress!)
+                    count = runGlyphsCount
+                }
 
                 var positions = runGlyphs.enumerated().map { (i: Int, glyph: CGGlyph) -> CGPoint in
-                    CGPoint(x: lineOrigin.x + (cellDimension.width * CGFloat(col + i)), y: lineOrigin.y + ceil(lineLeading + lineDescent))
+                    CGPoint(x: lineOrigin.x + x [i].x /* (cellDimension.width * CGFloat(col + i))*/, y: lineOrigin.y + ceil(lineLeading + lineDescent))
                 }
 
                 var backgroundColor: TTColor?

seems we don't need the space after a Unicode character.

for example,
$ cd dir && tree .
.
├── 你好.txt

in the func buildAttributedString(),

res.append (NSAttributedString(string: str, attributes: getAttributes(attr, withUrl: hasUrl)))
,
the str's value will become to "你 好 .txt", there's a space between '你' and '好', and a space between '好' and '.',
even we fix the space overlapping issue (I'v tried don't append a space after a unicode character), it just works well in the rendering mode, but in the selection mode, the spaces still be there, and we paste it to somewhere else, we just got "你 好 .txt" rather than "你好.txt".

@andy380743909
Copy link

Given a chinese character that uses more than one cell, and another one, we get two CTRuns, one for the Chinese character, and one that contains both a space and the second character. The space when we render seems to be overwriting the second half of the Chinese character:
image
The patch below extracts the position of the rune relative to the line, and helps a bit, but now the last character of the line is chopped, so there is some clipping somewhere I need to find, this is what it looks like now:
image

diff --git a/Sources/SwiftTerm/Apple/AppleTerminalView.swift b/Sources/SwiftTerm/Apple/AppleTerminalView.swift
index 59f8643..4ba20e0 100644
--- a/Sources/SwiftTerm/Apple/AppleTerminalView.swift
+++ b/Sources/SwiftTerm/Apple/AppleTerminalView.swift
@@ -564,9 +564,13 @@ extension TerminalView {
                     CTRunGetGlyphs(run, CFRange(), bufferPointer.baseAddress!)
                     count = runGlyphsCount
                 }
+                let x = [CGPoint](unsafeUninitializedCapacity: runGlyphsCount) { (bufferPointer, count) in
+                    CTRunGetPositions(run, CFRange(), bufferPointer.baseAddress!)
+                    count = runGlyphsCount
+                }
 
                 var positions = runGlyphs.enumerated().map { (i: Int, glyph: CGGlyph) -> CGPoint in
-                    CGPoint(x: lineOrigin.x + (cellDimension.width * CGFloat(col + i)), y: lineOrigin.y + ceil(lineLeading + lineDescent))
+                    CGPoint(x: lineOrigin.x + x [i].x /* (cellDimension.width * CGFloat(col + i))*/, y: lineOrigin.y + ceil(lineLeading + lineDescent))
                 }
 
                 var backgroundColor: TTColor?

seems we don't need the space after a Unicode character.

for example, $ cd dir && tree . . ├── 你好.txt

in the func buildAttributedString(),

res.append (NSAttributedString(string: str, attributes: getAttributes(attr, withUrl: hasUrl)))

,
the str's value will become to "你 好 .txt", there's a space between '你' and '好', and a space between '好' and '.',
even we fix the space overlapping issue (I'v tried don't append a space after a unicode character), it just works well in the rendering mode, but in the selection mode, the spaces still be there, and we paste it to somewhere else, we just got "你 好 .txt" rather than "你好.txt".

Yes, when the terminal builds attributed string for rendering, it appends a whitespace char after some special chars( double width chars Chinese char or Emoji, or something else). But when drawing the attributed string, the whitespace char is also drawn, it must be handled specially.

@andy380743909
Copy link

Perhaps an alternative is to allocate cells when the characters are generated. So when double-width characters like Chinese characters or emoji are produced, I record both the character and an empty slot after, and use this information while rendering, so I can normalize all output into a grid.

And perhaps rather than trying to create a single attributed string for the whole line, I need to break up the attributed string at these width-changing boundaries

If we use an empty slot, we must handle it when drawing the attributed string

CTFontDrawGlyphs(runFont, runGlyphs, &positions, positions.count, context)

Or we can use zero width space character (https://en.wikipedia.org/wiki/Zero-width_space) as an empty slot.
I tried with "\u{200B}", the rendering is ok
image

But the selection still has some problem

IMG_8050

andy380743909 added a commit to andy380743909/SwiftTerm that referenced this issue Oct 24, 2023
alexstaravoitau added a commit to rationalmatter/SwiftTerm that referenced this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants