Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sixels: last line cut/truncated on terminal emulators with "correct" text cursor placement #192

Open
dnkl opened this issue Feb 29, 2024 · 60 comments
Labels
compatibility Compatibility (e.g. terminal quirks) research Research & discussion

Comments

@dnkl
Copy link
Contributor

dnkl commented Feb 29, 2024

Sixel capable terminal emulators have gotten cursor placement (after emitting the sixel) wrong since the beginning. They usually put the cursor on a new line under the sixel. This means the terminal content may scroll, if a sixel is printed on the last row.

However, it's not how the VT340 did it. The simplified explanation is that it places the cursor on the last line of the sixel. Thus, if you want to print text under the sixel, you first have to print a newline.

The real algorithm is slightly more complex than that. A sixel is 6 pixels tall. This means it can cover two text rows. The DEC cursor placement algorithm puts the text cursor where the top pixel is. This means there are times when two newlines are required to print text under the sixel.

A number of terminals have started to implement the correct behavior. Terminals that implement the DEC placement algorithm are foot, contour, DomTerm and WezTerm. There may be more that I'm not aware of. XTerm is close to correct, but last time I checked, it placed the cursor on the bottom pixel (i.e. you always need a single newline).

Right now, running chafa <image> && echo "XXXXXX" will look something like this in e.g. foot:

chafa-last-line-cut
(picture shows a part of my dog's paw...)

A bit more information here:

@hackerb9
Copy link

hackerb9 commented Mar 1, 2024

Hi @dnkl. The VT340's algorithm should not be followed too strictly in modern terminals. Despite what the documentation implied, it uses a fast heuristic which relies on the character cell being 20 pixels tall. That algorithm was faster but included a glitch which should not be copied.

If you are designing a terminal that uses characters that are not 20 pixels tall, the algorithm does not apply and will have to be adapted in one of two ways:

  1. I suggest using the simpler algorithm which I believe you referred to as "bottom pixel". Such a terminal will be compatible with the way that programmers presumed the VT340 always worked and that DEC's documentation would lead a reasonable person to believe. All known original programs and sixel images for the VT340 will work correctly with that algorithm. Additionally, it is easily programmed for as one knows exactly how to draw text under any graphic: just send a newline.
  2. j4james has suggested limiting the sixel image resolution so that, regardless of the font resolution, each character cell shows only 10x20 pixels. This might be useful for someone who wants to run hypothetical VT340 software from thirty years ago which knows about and works around the VT340's cursor positioning quirk. One major downside is that graphical resolution is limited and the only way to increase it is to make the font so tiny it is unreadable.

I strongly believe the first method is the correct one for most modern terminals. It lets programmers easily create software that integrates graphics with character cell text interfaces, which is to me what makes sixels useful.


If you've read my discussion with j4james about whether this VT340 behavior is a "glitch", you'll see that even though he believes it is the historical behavior and thus correct for any terminal that claims to emulate a VT340, neither of us could come up with an easy solution for application programmers who want to just splat a sixel image on the screen and show some text underneath it. Since a workaround requires the application to model the internal state of the VT340, no sane program will ever intentionally use this odd behavior, whether it is technically a glitch or not.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 2, 2024

@hackerb9 I don't mind changing foot to always put the cursor on the last row touched by the sixel (i.e. the bottom pixel of the last sixel).

What I don't want is slightly different behavior in modern terminals, and I was under the impression that the other "correct" terminals also followed the DEC algorithm? If not, I'd be more than happy to update foot.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 2, 2024

That said, it looks like chafa isn't emitting a newline at all, so even with the tweaked cursor placement (always put it on the last row touched by the sixel), the image is sometimes cut off.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 3, 2024

@PerBothner @christianparpart @wez I was hoping we could all agree on how to implement cursor placement after emitting a sixel. As far as I can tell, foot, DomTerm, Contour and Wezterm all place the cursor on the same row as the last sixel. But do you follow the DEC algorithm, and place it on the same row as the upper pixel of the last sixel, or do you place it on the last row touched by the sixel (i.e. the row containing the bottom pixel of the last sixel).

I know at least some of you have been following the discussions between @hackerb9 j4james, but I don't know what you ended up implementing. From an application point of view, I think it would be beneficial if we all implemented the same cursor placement algorithm...

Foot currently implements the DEC algorithm, but I think it would be easier for applications if I changed it to just place the cursor on the last row. Then, to print text under the sixel, you know all that's needed is (always) a single newline. Not one or two.

But, I think it's a bad idea to change foot if all other sixel terminals implement the DEC algorithm, and don't want to change.

@PerBothner
Copy link

I agree putting the cursor on the row containing the bottom sixel row makes more sense, and I can certainly change it if that the consensus. I prefer to match xterm.js for various reasons. https://github.com/jerch - what do you think?

@jerch
Copy link

jerch commented Mar 3, 2024

@PerBothner Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore.
I did this to allow to print pictures in non 6-multiple px height and still properly align them at the bottom w'o nonsense excess row or excess space at the bottom.
(There is still a bug attached to it, where empty sixel bands at the end might get truncated - jerch/node-sixel#58)

@wez
Copy link

wez commented Mar 3, 2024

I'm open to tweaking wezterm to be more sane, assuming that there are a couple of test cases with examples of where the cursor should end up.

FWIW, I think the current cursor placement in wezterm may well be a bit of a fluke arising from re-using the iterm2 image protocol logic that preceded it rather than a conscious effort to implement the vt340 algorithm.

wezterm's logic for this (shared by iterm2, kitty and sixel handling) can be found here:
https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L65

the vertical position:
https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L166-L170

the horizontal position:
https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L233-L246

@hpjansson
Copy link
Owner

It may be a good idea to get @arakiken and the other mlterm developers on board too. I've been testing with it, since it had one of the first implementations, and is still one of the fastest. It currently (as of version 3.9.3) places the cursor on the row immediately after (that is, the first character row not touched by any sixel, transparent or not).

My main concerns as an application writer are a) consistency between terminals and b) simplicity of design. I'll happily support any consensus terminal developers arrive at.

Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore.

I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this.

@hpjansson hpjansson added research Research & discussion compatibility Compatibility (e.g. terminal quirks) labels Mar 4, 2024
@dnkl
Copy link
Contributor Author

dnkl commented Mar 4, 2024 via email

@dankamongmen
Copy link

following along for notcurses, good to see this effort taking place

@AnonymouX47
Copy link

Sorry to intrude....

I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer).

A unified vertical position is good enough for aligning images with text or other images vertically but not horizontally (i.e side-by-side).

Yes, it's probably possible to workaround this using absolute cursor positioning or save/restore but these are not always viable options, plus I believe the purpose of a consensus includes eliminating the need for workarounds in applications anyways.

Thank you all.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 4, 2024 via email

@dnkl
Copy link
Contributor Author

dnkl commented Mar 4, 2024 via email

@dankamongmen
Copy link

My understanding is the text cursor's horizontal position isn't changed at all. It only moves vertically. Put another way, it is positioned "at the beginning of the sixel", i.e in the bottom left corner of the sixel.

i went and looked at what we do in notcurses, and we do a hard cursor position after emission of any sixel. i imagine any application wanting to be portably correct will have to do the same thing, no? since they might be dealing with old terminals, or noncompliant ones, and it's not indicated via term queries? i don't want to disrupt unification, but from an app/toolkit author's perspective, i don't see how this helps...?

@dnkl
Copy link
Contributor Author

dnkl commented Mar 4, 2024 via email

@jerch
Copy link

jerch commented Mar 4, 2024

@hpjansson

I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this.

No you are right. This "bottom-most colored pixel" behavior cuts a fully transparent line of pixels at the bottom as not being part of the original image. If an image has that line intentionally, it will get stripped. Thats for level 1 sixel.

Correct me if I'm wrong and there's a way around this.

Well I put another warning into the docs not to use level 1 sixel on encoder side anymore, but to go with level 2 with explicit raster attributes denoting width and height extend. DEC STD 070 also tells us, that the graphics extends in raster attributes should never be exceeded by encoders, thus my decoder uses these to trim the graphics, which also solves the issue of non multiple-of-6 image heights in a more deterministic way.
We already had several discussions about the worth of the sixel chapter in DEC STD 070 and how that deviates even from DEC's own machines. Imho DEC STD 070 is the only lengthy source from DEC, thats tries to sound normative, e.g. by implying certain limits on the sixel format, like height and width, or 256 color slots rule. Maybe they did that to get it in line with other industry standards of that time (I guess 256 colors support was at the top notch end of the 80s hardware caps), but it kinda never came into life as they soon stopped the whole sixel line.

@AnonymouX47

I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer).

Thats not possible with sixel level 1, it has no width idea. Every sixel band can have different sixel cursor width to the right (an image might be ragged to the right) - which one to choose from?
Sixel level 2 brings width&height with its raster attributes, so yes that could be used for a right border.
To support both conformance levels - only the start cursor offset in a line is determined, which basically leads to the VT340 cursor mode.

Btw xterm.js also uses the VT340 cursor for IIP as the only supported cursor mode to level out image sequence differences. While it is more annoying to deal with that cursor mode as app dev, if you want to place text right of the image, its handling is always the same:

  • know initial cursor pos, either by tracking in your own buffer state or do a explicit CPR
  • place image of x*y pixels, either with sixel or IIP
  • deduct image output size in cols;rows from TEs grid resolution (to get grid resolution of the TE, either do ioctl or CSI 14/18 t)
  • move text cursor by image rows/cols up/right
  • write your text

@dnkl
Copy link
Contributor Author

dnkl commented Mar 4, 2024 via email

@jerch
Copy link

jerch commented Mar 4, 2024

I'd be more than happy to change it, to instead truncate the image to the width/height specified in the raster attributes. It'd just make everything simpler on the terminal side.

Yepp, it reduces code complexity alot, and on perf side - it is actually ~40% faster during sixel decoding because of known upper bounds prehand in my decoder.

@AnonymouX47
Copy link

AnonymouX47 commented Mar 4, 2024

Thanks (@dnkl, @dankamongmen and @jerch) for the clarifications and suggestions. I guess I can work with those.

EDIT: ... as regards cursor horizontal placement.

@hpjansson
Copy link
Owner

@dnkl

related, but perhaps worth its own issue; chafa currently ends the sixel with a GNL ('-'). Is this intentional? It adds an extra, empty, graphical row. I think it would be better to use a textual newline instead. Fwiw, this behavior has a (very) minor performance impact on foot, for sixels with an explicit width/height, as we're forced to reallocate and enlarge the backing image buffer, and then initialize it to the background color. I'm not really bothered by it, but thought it might be worth mentioning at least. Be happy to move this to a separate issue if you'd prefer that.

I don't remember exactly how intentional it was, but when I wrote most of the encoder back in 2018 I had to work around issues in existing decoders. For instance, I specify the raster dimensions but still make sure to pad every sixel row to the full width, since I noticed a case where the terminal would have garbage in the image buffer otherwise. It's possible the GNL was required by a decoder at some point.

That said, after testing it again now, it seems e.g. mlterm behaves the same with and without the GNL; I think it opens a new sixel row only when its pixel data starts arriving. I don't know of anything that needs the final GNL anymore, so I'll remove it.

I'm also partial to the idea that raster attributes should preempt dynamic resizing. It makes things more predictable for everyone.

@hackerb9
Copy link

hackerb9 commented Mar 6, 2024

I'm glad to see all the terminal developers here working together!

If I can summarize, it sounds like everyone is in agreement that modern terminals should allow what I will refer to as splat-nl-print: Applications may send sixels to a screen and simply send a newline before any text if they do not wish to overlap the graphics. Although VT340 compatibility is not the highest priority, I can add that my tests show splat-nl-print as the algorithm of choice even on a real VT340 as the occasional glitch is vanishingly rare in actual usage.

Additional points brought up:

  • Should the width and height specified in the Raster Attributes (RA) be used as a clipping box despite DEC's documentation explicitly stating RA does not limit the size of the image? Personally, I think, "Yes". It is a reasonable optimization for modern terminals when there is exactly one RA present in the sixel data stream. However, I would also hope modern terminals would be robust enough to fall back to unoptimized rendering when necessary — for example, no RA in the image, multiple RAs, an RA with zero width / height, or data where the program doesn't know the size ahead of time. (Sidenote: I do not expect any modern emulator to be able to handle @jerch's endless scrolling sixels.)

  • Should the text after a new line overwrite transparent pixels at the bottom of the graphic? I believe so unless the RA width and height specify otherwise.

  • Should Graphic New Line scroll the screen immediately before pixel data is received? Yes, I think that is correct. And applications encoding sixels should not output a final Graphic New Line at the end of the stream.

  • How can positioning of text to the right of a sixel image be made easier for application developers? I agree that a new issue should be created to discuss this. (If someone does, please @ me in the discussion as I'm curious about possible solutions.)

@j4james
Copy link

j4james commented Mar 6, 2024

If you're going to define your own version of Sixel, can you please make it something that apps can opt into or out of with a mode. Worst case, if you don't want to implement both standard and non-standard cursor placement, you could still report the mode as permanently set, and then apps can at least tell what behavior to expect from the terminal.

@PerBothner
Copy link

Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior.

@PerBothner
Copy link

@j4james I don't believe there is a "standard version" of Sixel. That is part of the problem: Different implementations act differently. Is "standard Sixel" whatever DEC implemented in their terminals? Are all such terminals consistent? What about the specifications (manuals) from DEC? What about corner cases not convered in the manuals? What about xterm - and which version of xterm? If all of these were consistent, I'd consider that as "standard sixel" - but I'm pretty certain that is not the case,

@hackerb9
Copy link

hackerb9 commented Mar 6, 2024

Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior.

UPDATE I have determined that I was mistaken about Xterm's behavior regressing. In fact, it is now almost precisely correct. The one thing it is missing, however, is moving the text cursor down on Graphic New Lines, which just happens to be the default output from ImageMagick's convert tool. @ThomasDickey.

@hackerb9
Copy link

hackerb9 commented Mar 6, 2024

Here is a new script, textcursor2.sh, which shows how a TEXT NEW LINE (or, equivalently, CURSOR DOWN) separates a sixel image from any following text on a VT340 with its 20 pixel high character cell.

textcursor2

It also shows what happens when GRAPHIC NEW LINE is used; the most important feature of which seems to be that it acts exactly like a single text new line whenever the image height is a multiple of the character cell height.

@jerch
Copy link

jerch commented Mar 7, 2024

@ all and some in particular:
Idk why the discussions on these things always have to heat up. Its also tiresome to get the important bits filtered out of the rants, and it waters completely the goal to find a model, that works for both sides, current TEs with their technical limitations and app side for screen state control and convenience.

@hackerb9
Copy link

hackerb9 commented Mar 8, 2024

I wouldn't mind if there were some way for the client to attach a hint indicating what kind of sixel regime it's expecting for a given image, possibly chosen from a set advertised by the terminal emulator. Ideally not just for cursor positioning, but cell size too.

Interesting idea, but I think we're getting a bit far afield. If someone starts an issue to discuss this, I'd like to be included on it.

Practical use cases for this come up now and then. I think someone in the MS Windows sixel support thread mentioned issues with sixel-based monitoring software still running in older power plants that had to be migrated from DEC to software TEs.

Click here to read hackerb9's digression on this.

Perhaps I'm misunderstanding here, but power plants should be fine. If they are presuming sixel graphics always extend the exact same number of character cells as a VT340 in 80 column mode, they can set their font to be a 10x20 bitmap. That already works great in Xterm.

Then TEs could default to a cell size (e.g. 10x20) depending on their emulation mode, but allow printing of sixels emitted for a different resolution while maintaining a correct character extent by applying a scaling factor. This is clearly extra work on the TE side, but if you support changing the font size at runtime you're probably keeping track of image-to-cell scaling factors already.

Again, I'm likely misunderstanding, but I hope modern TEs do not try to decouple the resolution of sixel graphics from the resolution of the font. The behaviour of sixel graphics when the terminal changes font size is undefined, but if you want to copy the VT340, the graphics should simply be erased as that's what happens when switching to 132 column mode. App programmers (at least on UNIX) will get a WINCH signal and redraw the screen. Since programmers can already determine the number of pixels per character cell, there is no extra work TE devs need to do. Why make it any more complicated?

As for coming up with a way to allow for sixels emitted at different resolution, I will briefly mention that the sixel protocol actually already has support for that. In addition to pixel aspect ratio settings which you may know about, device independent scale is defined by the ANSI SSU escape sequence (select size units) and the grid parameter Pn3. Although only DEC's printers actually obeyed it, the VT340 always generates sixel images with the correct real-world scale embedded in them so that hardcopy on any device would be exactly the same size as the screen.

Click to view excerpt from DEC-070 section 7.8

Level 2 Sixel Devices -

ASSUMPTIONS: Level 2 sixel devices support the Set Raster
Attribute command, Background Select, Horizontal Grid Size and
Macro Parameter commands.

Sixel control strings are sent as follows:

ESC P  Ps1 ; Ps2 ; Pn3 q " Pn4 ; Pn5 ; Pn6 ; Pn7  ******  ESC \

\___/  \_______________/ \_____________________/  \____/  \___/
 DCS   Protocol Selector    Raster Attributes     Picture  ST
                                                   data
\______________________/ \_____________________________/ 
 DCS Introducer Sequence     sixel data

Where:

  • Ps1 Is the Macro Parameter and is always ZERO.

  • Ps2 Background Select

    • 1 if Background Printing is disabled in Set-Up
    • 2 if Background Printing is enabled in Set-Up
  • Pn3 Horizontal Grid Size, given in units specified
    by ANSI SSU (default is decipoints, 1/720 inch).
    For default size units, the grid size should be
    6 for COMPRESSED and 9 for EXPANDED or ROTATED print.


    Since the host can change the printer between accesses,
    SSU should be sent once before each sixel dump.

        ESC  [    2    SP   I       (Set Size Unit
        1/11 5/11 3/2  2/0  4/9      to Decipoints)
    
  • Pn4 Pixel aspect ratio numerator, 1

  • Pn5 Pixel aspect ratio denominator, 1

  • Pn6 Horizontal extent
    (number of pixels in image horizontally)

  • Pn7 Vertical extent
    (number of pixels in image vertically)


While it'd be pretty cool if some modern terminal actually implemented device independent graphics resolution based on SSU, I think developer time would be better spent doing the reverse: MediaCopy to host. This was a graphics feature the VT340 supported and I'd find it immediately useful on terminal emulators. The escape sequence requests that the terminal send an image of the screen back to the host as a sixel file. You can probably imagine how handy this would be from the command line, though of course a modern TE would want to address privacy by making it something the user has to explicitly allow.

But I digress from my digression... 😄

@AnonymouX47
Copy link

AnonymouX47 commented Mar 8, 2024

To everyone...

Just want to say I think digressions are okay (though, should be avoided when possible) but I've noticed and would like to encourage @hackerb9's style of puting them (and replies to them) within <details> (with <summary>) tags in order to ease the filtering of important bits.

Thank you all.

@PerBothner
Copy link

PerBothner commented Mar 8, 2024

As I understand it, the controversial issue is: How should applications and terminals handle this common case: How to print an image (with no mixing of text and image on the same line), and then move the cursor to the first line below the image (that does not overlap any of the pixels)?

In "standard sixel" you can so this if the application knows how many sixel image lines there are per text line - the text height in "logical pixels". But this adds some complication and isn't necessarily portable. The "standard" has the awkward (somewhat useless) behavior that the text cursor is moved to match the line of the top sixel line of the last set of sixels, which means the application may need to send one or two newlines.

One idea: Define a "fresh-line" escape sequence (defined below). Then the application writes the image, followed by CR, then LF, then "fresh-line".

"Fresh-line" has no effect if the cursor is at the first column of an empty line, where "empty" means no characters and no overlap with a previous image. Otherwise, it is equivalent to CR+LF.

"Fresh-line" should be an escape sequence that behaves like CR+LF in terminals that are not aware of it. One possibility is:

\r\e[1;9D

The 9 is a modifier that tells the terminal to ignore the command if at the start of an empty line.

The fresh-line sequence is useful in other applications besides sixel output. For example a shell (or other REPL) can emit it before a prompt if it is unsure whether the previous command properly ended with a CR+LF sequence. (Though in this case the default is probably wrong: Assuming most commands correctly end with CR-LR, we want fresh-line to be ignored on terminals that don't recognize it. Perhaps we want to specify two alternate encodings for fresh-line: one that is ignored if not recognized; one that CR+LF if it is unrecognized. DomTerm implements \e[20u for the former.)

@hpjansson
Copy link
Owner

hpjansson commented Mar 8, 2024

That's an interesting proposal, though ideally I'd wish for something that can be used without returning to the first column, and which doesn't require a blank row. Chafa (by request) has a --relative switch that's supposed to print output at the current cursor location, leaving it immediately below the bottom-left corner. This works for symbol graphics, but the promise is hard to uphold for other image protocols.

Example image - click to expand

@hackerb9
Copy link

hackerb9 commented Mar 8, 2024

As I understand it, the controversial issue is: How should applications and terminals handle this common case: How to print an image (with no mixing of text and image on the same line), and then move the cursor to the first line below the image (that does not overlap any of the pixels)?

Fortunately, there is not much genuine controversy on that point: just send a single newline, '\n'. Easypeasy.

Click to read Hackerb9's humble opinion

The reason there seems to be controversy is that the vt340 can have an extremely rare quirk where the text will overwrite a few pixels at the bottom of the image. Depending upon your goal for the terminal emulator you are writing, this may or may not be important. It's not a major graphical problem and it almost never happens. Still a terminal which wants to be a faithful clone of the VT340's behaviours would of course care about this nuance. The cost of attempting to replicate it are high, causing other design trade-offs and adding complexity not just for the terminal developers but also for application programmers. Terminals which aim to be useful in modern times would be well advised to skip the quirk.

I believe it was not a design goal but a compromise. The VT340's "top pixel" heuristic is a quick approximation for a calculation that was too expensive at the time: "bottom pixel". Fortunately -- or perhaps by design -- the VT340's character cell height of 20 pixels makes that heuristic work just like "bottom pixel" in nearly every case.

I had thought this glitch was a bug in the VT340 but, after looking into it deeply, I am actually extremely impressed with the engineers from DEC. They came up with a clever solution that nobody noticed at the time was any different from the correct calculation. DEC's lack of documentation on this point would be surprising given how thorough the manuals are until one realizes it was probably omitted on purpose. If people knew the trick the VT340 was using, they might start relying on the quirky behaviour and future terminals would be obligated to support it.

Modern terminals have no need to approximate the calculation of the bottom-most opaque pixel as processors are not as limited as they were in the 1980s.

Even if they wanted to, there is no benefit to trying to extrapolate what the VT340 heuristic would be in modern times. Whatever it is, it is certainly not just picking the top pixel as that doesn't work for other character cell heights. Trying to salvage it by presuming all character cells are 10x20 regardless of the font size causes a cascade of other problems, the worst being that high res images require making the font size imperceptibly small.

And, even if some terminal did implement a heuristic that worked at any font size, it would be useless to application programmers. Calculating when to send two newlines is unnecessarily complicated and sometimes not even possible.

Consider the case where a program wants to display files that contain sixel screen dumps, perhaps captured by the VT340's MediaCopy. Since each file can contain an arbitrarily sized region, the program doesn't know ahead of time how high the image is in pixels. The only sane thing to do would be to send a single newline and presume one is enough to get the text cursor to a free line. This works on a VT340 so close to always that it isn't worth it to try to work around the occasional glitch.

In summary: We're talking about a very minor and rare graphical glitch that can occur on the VT340. While interesting from a historical perspective, only a precise VT340 emulator needs to care about such quirks. There is no benefit to copying this behaviour of the VT340 to modern terminals and much harm.


@PerBothner: Although not appropriate for sixel graphics, I could see your fresh-line proposal being useful for other situations, such as to make sure the prompt is located correctly after a program dies abruptly.

@hpjansson: To not return to the first column after displaying sixels, use IND, '\eD', instead of newline. If a terminal has newline working correctly, then IND should work, too.

@hpjansson
Copy link
Owner

hpjansson commented Mar 8, 2024

@hpjansson: To not return to the first column after displaying sixels, use IND, '\eD', instead of newline. If you have newline working correctly, then IND should work, too.

Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?"

@hackerb9
Copy link

hackerb9 commented Mar 8, 2024

Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?"

Just emit IND once, same as a newline. This conserves DEC's sixel design.

@hpjansson
Copy link
Owner

Just emit IND once, same as a newline. This conserves DEC's sixel design.

Okay - I'll do that (and unless I've misunderstood something, accept that a few pixels may get cut off). I'll get out of your hair now so you can discuss the other aspects (e.g. should raster attributes define a clipping rectangle? :-) Enjoying the conversation.

hpjansson added a commit that referenced this issue Mar 8, 2024
The final GNL could cause extra space to be emitted in some
circumstances.

Also fix an issue causing more bands to be padded than necessary when
multithreaded.

See #192 (GitHub).
hpjansson added a commit that referenced this issue Mar 8, 2024
This positions the cursor correctly ~everywhere.

See #192 (GitHub).
@dankamongmen
Copy link

Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?"

maybe i'm misunderstanding the need, but in notcurses i handle what i believe to be your problem by getting the terminal size in pixels, dividing that out by the number of rows and cols, and using those as the cell pixel dimensions. doesn't this provide you enough?

@hackerb9
Copy link

hackerb9 commented Mar 9, 2024

should raster attributes define a clipping rectangle?

Good question. I've already said I think it's a reasonable, if not ideal, optimization even though it clearly violates both DEC's documentation and actual hardware behaviour.

I should ask, though, does anyone have a good hypothesis for why DEC repeatedly stated that sixel images can extend beyond the rectangle defined by RA? What is lost by taking this optimization?

My working theory had been that DEC probably wanted RA to define a clipping box but their hardware wasn't up to the task. However, that kinda falls apart when I look into it as their "GPU" (DRAGON) actually featured multiple viewports that might have done the job in no time. And, if jerch's results apply, using clipping could have actually made the VT340 run quite a bit faster, not slower. But do they apply? Would the VT340 have seen a significant speed benefit?

@jerch, when you say you get a 40% speed boost, what exactly was the bottleneck? Memory pressure from dynamic allocation of large rectangles?

@dnkl
Copy link
Contributor Author

dnkl commented Mar 9, 2024

@hackerb9 how does trailing GNLs interact with last transparent rows being clipped? One way of looking at final, trailing GNL, is that it is a completely transparent sixel row (and thus that it should be removed). But perhaps it's more correct to say that a GNL should be treated as a fully opaque row, until you start printing sixels; then you start tracking the bottom-most opaque pixel.

when you say you get a 40% speed boost, what exactly was the bottleneck? Memory pressure from dynamic allocation of large rectangles?

I can obviously not speak for @jerch , and I, too, am very curious. However, for me, there's no 40% speed boost just from allowing the raster attributes to act like a clipping region. Foot allocates the entire backing memory when the raster attributes is set. We still have to check for "overflows" (either increase image size if the sixel cursor goes beyond the raster attributes, or ignore the sixel). Thus, it makes very little difference while processing sixel characters.

There would be a small performance gain, in that we wouldn't have to reallocate the backing image when we encounter "sloppy" encoders that emit a trailing GNL, that triggers a vertical resize.

Treating it as a clipping region does simplify things though. And, almost removes the need to scan for last-opaque sixel row ;)

But, I'm fine with either way.

@j4james
Copy link

j4james commented Mar 9, 2024

does anyone have a good hypothesis for why DEC repeatedly stated that sixel images can extend beyond the rectangle defined by RA? What is lost by taking this optimization?

Infinite scroll would be the most obvious example (I'm sure we discussed this somewhere before but I can't find it in your repo right now). You'd also lose some bandwidth saving tricks that could be beneficial when working with non-rectangular output. You can see the sort of thing I mean in the raster dimension tests.

@hackerb9
Copy link

hackerb9 commented Mar 10, 2024

@hackerb9 how does trailing GNLs interact with last transparent rows being clipped?

Before I get into the weeds about a trailing graphic newline, I do want to say that I think GNL is not as important as getting the text newline behaviour consistent across modern terminals.

One way of looking at final, trailing GNL, is that it is a completely transparent sixel row (and thus that it should be removed). But perhaps it's more correct to say that a GNL should be treated as a fully opaque row, until you start printing sixels; then you start tracking the bottom-most opaque pixel.

Click to see hackerb9's pondering of GNL

@j4james is most knowledgeable of precise VT340 behaviour and may even know the exact algorithm for 20 pixel tall fonts off-hand.

For modern terminals, I think perhaps a better question would be why did DEC choose the algorithm they did for the VT340? We've already seen that sometimes they developed fast but inexact algorithms to overcome hardware limitations, so what benefits did the algorithm they chose for the VT340's GNL provide to programmers and users at that time?

With the caveat that I haven't thought this out as deeply as I have text newlines, here's my current take on GNL:

EFFECT OF A TRAILING GRAPHIC NEW LINE ON TEXT CURSOR POSITION

Previous image height Behaviour
Exact multiple of text height Cursor is moved to the blank line under the image
Anything else Has no effect (usually)

It seems that a trailing GNL is practically useless to current programmers as the following text will almost always overlap. The one case it is sure to give a fresh line is not terribly useful since a text newline works the same and is more general.

I don't know the design parameters DEC was constrained by, but it looks an awful lot like an attempt at backwards compatibility. Historically, sixels were designed for printers and teletypewriters in which GNL represented advancing the paper by a fraction of the usual line height.

Excerpt from LJ250 Printer Programmer's Reference Manual

6.3.2.4 Graphic New Line (-) The graphic new line (GNL) control code (2/13) sets the active column to the [graphic] left margin and advances the paper by the current sixel height.

Since the fractions can add up, it makes sense that some programmers may have relied on printing images at a multiple of the line height and sending a final GNL to move the printhead to the next (whole) text line instead of using an explicit LF. Perhaps this was a common programming idiom and DEC wanted to make sure it still worked on video terminals.

A possible critique and response

One problem with this theory is that printer-terminals, unlike video-terminals, might have been able to print a fraction of a line down so sizing images to a multiple of the line-height might not matter. Response: It's also possible that being aligned to whole lines was important if not 100% necessary. For example, the manual for the DEC LA100 printer has a caution about using Partial Line Down:

The PLD sequence does not modify the active line. To avoid losing the top of form reference send an equal number of PLU sequences to the terminal.

Another possible reason aligning to whole lines may have been important back in those days was that green bar paper was common, but that seems weak to me.


Even if my above theory is correct, one thing I don't get is why not always advance the text cursor? What, if any, benefit is there to have a trailing GNL stay on the same line?

My first thought was that perhaps the fractional page motion was saved and would be used to align any following sixel images, but no, they overwrite the previous image just like text does. Speed of calculation is likely part of it, but what exactly were they trying to calculate? I suspect this is a historical mystery which won't be solved until someone documents the actual behaviour of something even older than a VT340, perhaps a DECwriter IV printer-terminal.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 11, 2024

Alright, I now have three open PRs for foot, addressing the following:

  • Place cursor on the last character row touched by the sixel: this is the one I started out this ticket with.

  • limit image size to the one specified in the raster attributes: changes foot from allowing images to grow beyond the dimensions in the raster attributes. This is mostly to sync with @jerch. In the end, it didn't really offer any major benefits, and I would be just as happy to continue supporting dynamically growing the image beyond the raster attributes' dimensions. If I were to decide all on my own, I wouldn't merge this PR, and instead continue supporting dynamic resizes. Note that even with this PR, dynamically sized images are still supported, as long as they omit the raster attributes.

  • trim trailing, fully transparent sixel rows: does what it says. We haven't really discussed the nitty gritty details on this one, so, I chose to do this for all sixels, regardless of the background color mode (i.e. the P2 parameter), and regardless of whether there are any raster attributes present or not.

Is this something you all (though I guess it's pretty clear where @j4james stands on this) would consider implementing in your TEs?

Just to make it clear. I don't intend to merge any of the above (1640 being the exception) unless we can reach at least some level of consensus here.

@hackerb9 thanks again for your detailed explanation. What I ended up doing (in 1640), is to let trailing GNLs move the text cursor as if you had at least one fully opaque 6-pixel sixel on that row, but as soon as you start printing sixels, I switch to tracking whatever the actual bottom pixel is. In other words, a trailing GNL will not be trimmed out when we remove trailing, transparent sixel rows.

@hpjansson
Copy link
Owner

@dankamongmen

maybe i'm misunderstanding the need, but in notcurses i handle what i believe to be your problem by getting the terminal size in pixels, dividing that out by the number of rows and cols, and using those as the cell pixel dimensions. doesn't this provide you enough?

It's sufficient, but not ideal (click to expand summary):
  1. The final sixel band can fall entirely within the final cell row, or it can fall in multiple cell rows (most likely the final two - but technically if your cells are <= 4px tall, it could be more). When the terminal leaves the cursor at the row containing the topmost pixel of the final sixel band, it means you'll have to move the cursor down by one or two (or perhaps more) rows to get clear of the image. This seems more complex than necessary.
  2. You need an interactive terminal session that can report its pixel size (ioctl or control sequences). A tool like convert can't produce a sixel image occupying a consistent cell extent.
  3. There's probably a race condition where the application gets the terminal dimensions, and while it prepares the image, the terminal's cell size changes. "Zoom" accelerators like the ones VTE implement (C-S-+ and C-S--) allow the user to trigger this easily during animations.

To be clear, I'm not asking for anything in particular to be done about this, just that it's taken into account if terminal maintainers are making changes anyway. IMO, a broad consensus is more important than any of these concerns. Also, as @hackerb9 suggested, we should probably leave points 2 and 3 for a separate issue :-)

@AnonymouX47
Copy link

AnonymouX47 commented Mar 11, 2024

@dnkl

Considering images actually having trailing transparent rows, I have a couple concerns/questions as regards trimming trailing transparent rows:

  1. Won't it affect vertical cursor placement?
  2. Won't it affect drawing an image with P2=0 over another?

@dankamongmen
Copy link

@hpjansson thanks for the explanation. i work around these three issues, but they're all valid concerns.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 14, 2024

@AnonymouX47

Won't it affect vertical cursor placement?

Yes, and that's kind of the whole idea. If we don't trim, all images will be forced to have a height that is a multiple of 6.

If we choose to truncate images with raster attributes, we could also choose to not trim trailing transparent rows. But if we don't truncate the image, I think trimming should be done regardless of whether the image has raster attributes or not. Otherwise, an image with raster attributes would still be forced to have a height that is a multiple of 6.

Won't it affect drawing an image with P2=0 over another?

That's a valid question. Not sure if @hackerb9 has any insights on what the real VT340 does? I would kind of make sense to only trim when P2=1.

@AnonymouX47
Copy link

If we choose to truncate images with raster attributes, we could also choose to not trim trailing transparent rows.

Honestly, I think this approach results in the most reliable/consistent/predictable behaviour and is technically the most straightforward and efficient to implement... both for TE and app developers.

@hackerb9
Copy link

hackerb9 commented Mar 15, 2024

Won't it affect drawing an image with P2=0 over another?

That's a valid question. Not sure if @hackerb9 has any insights on what the real VT340 does? I would kind of make sense to only trim when P2=1.

Definitely a good question, though straying a bit from the issue nominally at hand (newlines: graphical and otherwise).

I just ran a test of p2 effects on overlaying graphics and the results surprised me.

The rules for overlaying graphics seem to be:

  1. If transparency is on (P2 = 1), everything is composited as expected regardless of the setting of RA.
  2. If transparency is off (P2 = 0) and a size is specified in RA, a rectangle of that size is cleared and the cursor is moved back to the starting corner (top left of image) before drawing sixels. This does not affect the final text cursor.
  3. If transparency is off (P2 = 0) and a size is not specified by RA, then a rectangle of the background color is cleared from the cursor position to the bottom right corner of the screen. This also does not affect the text cursor.

№ 3 was the most surprising to me, but I guess it makes sense for a sixel parser: if you don't have any guess what size the graphics actually are, but you know there's an opaque background that must be cleared first, set the RA size to maximum.

This behaviour also fits with how the documentation talks about the RA size parameter not being the actual geometry of the sixel image but rather an easy way to clear a rectangle. (You can see that in my test because I made the 20x20 image have a 60x60 RA size, which matters when transparency is off, P2=0).

It was also interesting to me that the Raster Attribute size had no effect on the final cursor position. I'm not sure what the benefit is, but I think perhaps it makes sense since multiple Raster Attributes are allowed in a single sixel DCS string.

If you want to test your terminal emulator of choice, you can get my script from here: https://raw.githubusercontent.com/hackerb9/vt340test/main/sixeltests/p2effect.sh . I'm curious to know the results.


Footnote

Footnote: I think of the VT340 as lacking the rectangle operations that existed in later terminals like the VT4x0. I'm not sure if I ever quite grasped before that that there is actually an easy way to clear rectangles on the VT340. (And the rectangle doesn't even have to align to the character cell! --- not sure if that's a bug or a feature.)

@AnonymouX47
Copy link

Wow! Ain't that something... Now, i kinda regret asking.

@dnkl
Copy link
Contributor Author

dnkl commented Mar 15, 2024

@hackerb9 thanks! That's some interesting results. I'll be doing a couple of changes in foot to better match the VT340. I'm also inclined to not make RA truncate images, but instead continue allowing images to extend beyond their RA. But combine that with trimming trailing transparent sixel rows.

@wez
Copy link

wez commented May 5, 2024

I ran the p2effect.sh script on wezterm and xterm.

xterm:
image

wezterm:
image

Looks a bit wonky in wezterm(!)

@j4james
Copy link

j4james commented May 6, 2024

@wez I think it's your system that is a bit wonky! Maybe an incompatibility with the shell? Because even the Xterm image has a whole bunch of mistakes that I'm not seeing when I run the script myself. For example, where are all those $ characters coming from? Why are the red blocks offset one column to the right? And why is the cursor text not lined up with the character? Xterm does have a few issues, but it's not nearly as bad as it appears in your screenshot.

@hackerb9
Copy link

hackerb9 commented May 9, 2024

Hey @wez, I think @j4james is right about your shell. Did you get it figured out? If not, please let me know the output of bash --version as my script should definitely not be doing that. If I recall correctly, MacOS comes with an ancient version of bash.

@wez
Copy link

wez commented May 10, 2024

Ah, I think I was lazy and didn't chmod the script and just ran it with sh. Running with bash explicitly gives:

xterm:

image

wezterm -n:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Compatibility (e.g. terminal quirks) research Research & discussion
Projects
None yet
Development

No branches or pull requests

9 participants