Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inline box layout #25

Open
nicoburns opened this issue Mar 22, 2024 · 10 comments · May be fixed by #67
Open

Support inline box layout #25

nicoburns opened this issue Mar 22, 2024 · 10 comments · May be fixed by #67

Comments

@nicoburns
Copy link
Contributor

nicoburns commented Mar 22, 2024

Motivation

One may wish to mix textual and non-textual content and have the non-textual content laid out in flow with the text. For example, in order to display images or even whole widgets within paragraphs of text. This is necessary in order to implement web-style "inline/flow" layout, but it's use is not limited to web layout contexts: it is a feature that is more generally useful to anyone wishing to layout mixed content.

Notes

The functionality required from the text layout system in order to implement "inline layout" is laying out fixed size boxes, possibly with some kind of supplementary baseline alignment / vertical alignment information. There is no need for the text layout system to size such boxes.

Proposed Implementation

I think we can avoid involving inline boxes in ranged style resolution. Based on that assumption, my proposal is as follows:

  • Update Builder
    • Define:
    struct InlineBox {
        /// The width and height of the box in pixels
        size: kurbo::Vec2,
        /// The index into the text string at which the box should be placed.
        index: usize,
        /// An arbitrary user-defined id that consumers of parley can use to determine
        /// which inline box is which in the resultant layout
        id: u64,
    }
    • Add an inline_boxes: Vec<InlineBox> property to LayoutContext
    • Add a push_inline_box(box: InlineBox) method RangedBuilder which pushes to the inline_boxes property in the layout context
    • Sort the inline boxes by index in RangedBuilder::finish
  • Update run-splitting:
    • Update shape::shape_text to break text runs at text indexes where an inline box is present (in addition to all of the places where it already does so).
    • Define:
    enum RunOrBox {
         Run(RunData)
         Box(InlineBox)
    }
    • Change LayoutData.runs to LayoutData.runs_or_boxes using the new enum
    • Update shape::shape_text to push inline boxes to LayoutData.runs_or_boxes
  • Update line breaking
    • Define:
     struct InlineBoxPosition {
         /// The x and y position of the box
         position: kurbo::Vec2,
         /// The unique id of the box
         id: u64,
      }
      enum LineRunOrBox {
           Run(LineRunData)
           Box(InlineBoxPosition)
      }
    • Change LayoutData.line_runs to LayoutData.line_runs_or_boxes using the new enum
    • Update BreakLines::break_next to account for inline boxes when line breaking. This should:
      • Compute an (x, y) location for the box (either global or line-relative)
      • Affect the position of subsequent runs of text
      • Affect the line height of the line (flooring it by the box's height)
  • Update alignment
    • Update BreakLines::finish to account for boxes when performing alignment
@xorgy
Copy link
Collaborator

xorgy commented Mar 22, 2024

Seems like a good start, but also affects Cursor. I think it would be wise to have a text-only interface (maybe keep the current API there) that handles the case of mere text; and implement that on top of the more general interface.

@dfrg
Copy link
Collaborator

dfrg commented Mar 24, 2024

This looks like the exact right approach for adding inline objects to the current API. The only potential issue I see is bidi handling-- objects would essentially be ignored.

Before we get started, I'd like to propose a different approach... something I've been hacking away on when time permits for a few months now. I think we should drop the "attributed text" API, at least as the core layout primitive, and replace it with a tree-based API. The current proof of concept looks like this:

/// User defined identifier for a text element.
pub type TextId = u64;

/// User defined identifier for an inline object element.
pub type ObjectId = u64;

/// User defined identifier for a span element.
pub type SpanId = u64;

impl<'a> Builder<'a> {
    /// Pushes a new span with the given identifier and style.
    ///
    /// Any undefined properties in the style will be inherited from the
    /// parent (or default) span.    
    pub fn push_span(&mut self, id: SpanId, style: &Style) {}

    /// Pops the current span.    
    pub fn pop_span(&mut self) {}

    /// Appends a text fragment.
    ///
    /// The `id` parameter allows tracking the source of the fragment and
    /// the `start` parameter can be used to track the offset where this text
    /// begins in that source.    
    pub fn text(&mut self, id: TextId, start: usize, text: &str) {}

    /// Appends an inline object.
    ///
    /// The `id` parameter allows tracking the source of the object. The
    /// dimensions are used during line layout and breaking.
    pub fn object(&mut self, id: ObjectId, width: f32, height: f32) {}

    /// Consumes the builder and flushes any pending operations.
    pub fn finish(mut self) {}
}

IMO, this is more generally useful than attributed text (which can be built atop this) and goes a long way toward supporting real inline flow layout. The current state does track child -> parent edges for all nodes so the tree structure can be ascertained from the layout with a bit of work.

The complexity of the cursor type, and text selection/navigation more generally, will likely grow in response to this but I think that's a worthwhile trade-off. We can provide a simplified API mode if necessary.

@nicoburns
Copy link
Contributor Author

  1. Yeah, I realised after I wrote the above that we might need to do something with bidi for objects. I suspect they would generally be neutral (and that would be good enough for a first cut), but we might want to allow the author to explicitly set a bidi direction in future (in case the box happens to contain text).

  2. Would this tree-based approach still resolve down to a set of non-overlapping ranges? If so then keeping both would probably be easy enough. And potentially the "resolution" functionality (range and tree based) could be spun out into a standalone library (operating over trait(s)).

  3. Having said that, one thing I was wondering about is if/how style resolution could be handled incrementally. It seems entirely non-ideal to have to create a fresh Builder every time the text and/or styles change.

    One way in which this could be improved is if the style resolution could operate over a custom tree rather than needing to converted into parley types (similar to how selectors, html5ever and taffy work: you have your own tree representation for which you implement a trait that the library operates over). That would at least eliminate the need to push styles into a builder every time. But it wouldn't make the actually resolution incremental.

    If that's not workable then perhaps another way would be to allow Builder's to be retained, mutated and then reused.

  4. Do you have an estimate for when this might be available? I feel like it's going to be difficult for me to contribute to parley while large chunks of it (both this and fount -> fontique) are liable to replaced (Xilem has a similar problem with Masonry).

@xorgy
Copy link
Collaborator

xorgy commented Mar 24, 2024

RE: 1. Might be reasonable to somehow make the object aware of the writing direction (and mode, if we implement it).

RE: 4. I don't think there is much planned to change right this moment outside of font libraries/fallback (fount/fontique) but I might be wrong about that. The rest is working fine for current use cases.

@dfrg
Copy link
Collaborator

dfrg commented Mar 25, 2024

Seems like we're all on the same page wrt object bidi. The current code just adds an object replacement character so it's boundary neutral but we can easily use strong a LTR/RTL character instead, or surround the BN character with LRI/RLI+PDI controls depending on what works best. We could expose this with a writing direction property as Aaron mentioned.

I'm going to assume further discussion is based on some sort of tree structure because that's the general case and the more difficult one to solve. Both cosmic-text and parley already do a pretty good job with attributed text.

So given a tree structure, we can define two processing stages that need to occur in the "front end" phase of layout:

  1. Style resolution: agnostic to text content so this can definitely be done in a separate pass and retained. Would keeping a Vec<ComputedStyle> that subsequent stages can reference by index be good enough or do we need to traitify this? I'd prefer to keep it simple if possible but am open to additional abstraction.

  2. Flattening: we need to accumulate the content of the text nodes into a contiguous buffer (just a String currently) at paragraph granularity in order to deal with bidi, segmentation and shaping. Doing this also requires very careful bookkeeping of source ids and ranges so they can be accurately reported from the final layout object. I don't see a great way to avoid recomputing this when the text is modified. We could offer a sort of retained mode layout but I feel like keeping a full shadow tree is probably wasteful and has potential to get out of sync.

The API I posted above handles both of these simultaneously but I don't see why the functionality couldn't be exposed separately.

And potentially the "resolution" functionality (range and tree based) could be spun out into a standalone library (operating over trait(s)).

I'm a slightly hesitant +1 on spinning out the style components. I'd like to see this happen but my preference is to get a solid concrete implementation working first.

One way in which this could be improved is if the style resolution could operate over a custom tree

I'm not opposed to adding a pull-based API but I think that can be built on top of the push API. We need to flatten the text nodes and track references through shaping anyway so having access to the client's tree doesn't save us much.

Do you have an estimate for when this might be available? I feel like it's going to be difficult for me to contribute to parley while large chunks of it (both this and fount -> fontique) are liable to replaced (Xilem has a similar problem with Masonry).

Apologies for this. On zulip, I requested two weeks to make some potentially sweeping changes. I'll see what I can get done in that timeframe. After that, I'll just do a handoff to linebender regardless of the current state and we can address any further changes through the normal PR/review process.

@dfrg
Copy link
Collaborator

dfrg commented Mar 25, 2024

Just wanted to add: doing incremental updates to layout in general and inline flow in particular is a known hard problem. If we can nail it, that's great but I don't think we should spend a great deal of time on it as a first pass. I'd rather focus on making the full rebuild case as fast as possible and then tackle the incremental case later if necessary.

@dfrg
Copy link
Collaborator

dfrg commented Apr 1, 2024

I've done a lot of thinking (and a little bit of coding) on this so I thought I'd write down a few more thoughts.

If we're considering real CSS typesetting, the main sources of complexity are tracking tree structure and handling merging and splitting of text fragments. The combination of these two drives up the difficulty level fairly quickly. I've tried to identify the points in the pipeline where these come into play:

  • For an actual DOM/CSS inline formatting context, handling white-space-collapse and text-transform requires keeping track of where contractions and expansions occur in the text nodes. Chrome does this with an offset mapping data structure. I've prototyped a working version of this and the memory cost is fairly high especially if you construct pathological cases but any browser like layout needs to handle it. This isn't currently on our critical path but I wanted to understand how this might fit into parley.
  • During itemization, we need to split fragments at script/bidi level boundaries at least. During shaping, we want to greedily merge fragments when possible. Consider the HTML: <span>a</span><span>&acute;</span>. We should be able to correctly shape this into á if the associated styles for each span do not block merging.
  • Line box layout is complicated when nodes have non-zero margin, border or padding which are all honored in the inline direction. Bidi makes this even more fun since the shuffling can break depth-first ordering causing some really interesting placement of borders. I think this can be dealt with fairly simply by adding elements for open/close tags with appropriate bidi levels to the item stream and just let reordering do it's thing.
  • I haven't really thought at all about floats but my assumption is that those are fairly easy to handle if the above issues are resolved.

My goal over the next week is to experiment with these things and see how parley can be modified to accommodate them. I can't allocate any actual work hours to this right now so my time is limited. Regardless, I've set a target of April 8 to do a transfer to linebender so progress is not blocked directly on me. We can move forward from there.

@nicoburns
Copy link
Contributor Author

Really awesome to see that you're thinking about this stuff. I am definitely interested in implementing fully-compliant CSS typesetting. Having said that, as things stand I would happily make do with something close-but-not-quite. And would like to put out there that (as someone building a web renderer) my main priorities are:

  1. Mixed content: the ability to mix images/boxes/widgets inline with text (i.e. the original topic of this issue). Without this large swathes of web content are just irredeemably broken.

  2. Handling padding/border/margin on spans of text. This is less critical but IME is pretty widely used on the web.

Even with just 1, I feel like that could take me from "most of the web is pretty broken" to "most of the web is somewhat resembling what it's supposed to look like" and for text layout to no longer be such a critical blocker.

Things like whitespace collapsing and text-transform will be important long-term but seem like they could be emulated relatively well by pre-processing content before passing it into parley. Merging fragments for shaping seems like it is needed for completeness but also something that is likely to be relatively rare in real-world content (surely nobody would generate content like that on purpose - it seems likely that it would mostly be the output of badly behaving WYSIWYG tools?).

@nicoburns
Copy link
Contributor Author

nicoburns commented Apr 2, 2024

Regarding floats: I have thought about this a little, so I thought I'd do a brain dump of my own.

My understanding thus far (with the caveat that there may be edge cases I haven't come across yet) is that text layout can primarily think of floats simply as excluded regions (specifically rectangles) that text should not be laid out into.

The annoying float-specific complexity being that those regions are not known ahead of time and are generated/placed when the floated box is encountered in the text stream. And that this can require glyphs in the current line to be retroactively moved out of the floated region. The good news being that:

  • this will never affect already "committed" lines as a float is never placed above the current line's y-position (possibly with an exception for floats at text position 0, but in that case there are no already-placed glyphs anyway)

  • this will never cause extra line-breaks in the already placed text (because in that case the float is pushed down to the next line instead)

So float layout basically involves laying out around excluded areas + sometimes shifting all glyphs in the current line to towards the end of the line.

The actual placement of the floated box is quite complicated, but AFAIK that doesn't depend on text layout or glyph positions other than "what is the Y offset of the current line" (it also depends on "block formatting context" container size and the position of other floats) so that could potentially live outside of parley (IMO that might make quite a nice standalone micro-library).

@nicoburns nicoburns linked a pull request May 30, 2024 that will close this issue
@nicoburns
Copy link
Contributor Author

@dfrg Having come to implementing web inline layout in Blitz using Parley, I'm definitely seeing the benefit of the tree-based model. Indeed I suspect it probably be easier to just go ahead and implement this in Parley than try to "lower" a DOM tree down to Parley's existing API (seems like you might already have been of this opinion).

Your comment above makes it sound like you might actually have some (unfinished) code implementing this? I'm probably going to look into implementing this tomorrow. I have a pretty good idea of how I might do that (and your notes above are making a lot more sense now I'm actually working with tree-shaped input data), but a partial implementation might well be a useful reference if you have one sitting around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants