Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Font system redesign #32033

Open
mrobinson opened this issue Apr 9, 2024 · 12 comments
Open

RFC: Font system redesign #32033

mrobinson opened this issue Apr 9, 2024 · 12 comments
Labels
B-RFC A request for comments on a proposal

Comments

@mrobinson
Copy link
Member

mrobinson commented Apr 9, 2024

Current system

Currently Servo uses a single FontCacheThread, which is responsible for loading the system font list and also storing the available web fonts. In addition, every layout thread stores a FontContext which caches all font data structures. When the data structure is not in the thread-local FontContext cache, IPC messages are used to request the data from the FontCacheThread.

FontCacheThread

The FontCacheThread is a long-running background thread that is reponsible for reading the sytem font lists and storing data for web fonts. It also manages WebRender font primitives, sending them to the layout-thread-specific FontContext. Some of the data structures it manages:

  • FontIdentifier: Unique platform-specific identifier for a source for creating a font face. For local fonts it contains either a path to a local font (or data used to find a path) and possibly a variation index. For web fonts, this is a URL. Multiple FontTemplates might have the same FontIdentifier.
  • FontDescriptor: Describes a font's properties (weight, stretch, style).
  • FontTemplateData: Platform-specific data for a font. Contains Arc<Vec<u8>> bytes, FontIdentifier and in the case of MacOS a CTFont cache.
  • FontTemplate: "All the information needed to create font instance handles." This includes a FontIdentifier, a FontDescriptor, and the FontTemplateData (both strong and weak references).

FontContext

The FontContext is the per-layout-thread store of instantiated fonts and cached FontTemplates. When a layout thread is doing font matching, it asks the FontContext to find templates that match a given descriptor. These requests are forwarded to the FontCacheThread via IPC and the responses are cached. When sending FontTemplates to FontContext the FontCacheThread does not serialize font data, as that would be too expensive. Instead it serializes all of the members apart from the data into SerializedFontTemplate and then sends the data via a direct write into an IPC channel. The FontContext is also responsible for turning FontTemplates into concrete Fonts which hold platform-specific data structures via their FontHandle member and other instance information.

Problems with the current system

  • Web fonts are global: Web fonts are associated with family names in the
    global FontCacheThread. Here's an example:

     <!DOCTYPE html>
     <iframe src="font-iframe-1.html"></iframe>
     <iframe src="font-iframe-2.html"></iframe>

    font-iframe-1.html:

    <!DOCTYPE html>
    <style>
    @font-face {
        font-family: 'SpecificFontName';
        src: url(https://fonts.gstatic.com/s/firasans/v17/va9B4kDNxMZdWfMOD5VnLK3eRhf6Xl7Glw.woff2) format('woff2');
    }
    div { font-size: 30px; font-family: "SpecificFontName" }
    </style>
    <div>Hello!</div>

    font-iframe-2.html:

    <!DOCTYPE html>
    <style>
    div { font-size: 30px; font-family: "SpecificFontName" }
    </style>
    <div>Hello!</div>

    If you load the first page and then reload, both iframes will use the same font, even though it should only be available on the first page.

  • Font data is copied to each layout thread: After Change font serialisation to use IpcBytesReceiver #28736, font data is no longer serialized and deserialized via IPC, but it is still copied to each layout thread and cached there. There is no sharing of data between threads.

  • Font data is loaded more than once for fonts that share files: When more than one font is stored in the same file, the file is read from disk multiple times, even if the font cache has just read that file. This is because font data is stored on FontTemplate, but fonts with different names that share files have different FontTemplates.

  • Font data for web fonts is never unloaded: Font data for web fonts is stored in the the global font cache thread, so it is never unloaded -- even when moving between pages.

Redesign

The idea of the new system is that all font data structures will be both Sync and Send. There will still be a font cache thread for long-running operations such as loading platform font lists and sanitizing web fonts, but in general there will be global stores of font information protected by RwLock, much like how the HTTP cache works in Servo. In the optimal path, many threads can have access to font data without having to do IPC with the font cache thread at all, only blocking when more than one thread is trying to lazily construct the same font resource at the same time.

FontDataStore

There will be a global FontDataStore which keeps a handle on the loaded byte data of fonts. This will include the bytes of both system fonts (keyed by file path) and web fonts (keyed by URL). The font data store will keep data for fonts that are in use and also keep a an LRU cache to enable font data sharing. When font data no longer has outside reference, font data will be kept in memory by the cache. If a font with no outside references is evicted from the cache its data will be freed. The reason this is stored separately from FontTemplate is to ensure that FontTemplates sharing a file on the system or URL can use the same font data.

FontStore

A font store will store FontTemplates for a list of fonts available, much likethe FontCacheThread does now. There will be a single shared GlobalFontStore used for system font list and also one FontStore per Layout that is used to store web fonts. Each layout will have a separate FontTemplate for web fonts, but those templates will share the actual byte data through the FontDataCache. Each FontStore will also be protected by a RwLock so that it can be accessed by multiple threads without having to use IPC. Operations that mutate the FontStore will block all access to the FontStore.

FontTemplate

The design of FontTemplate will remain very similar to the current design, but FontTemplateData will be integrated into the FontTemplate itself. In addition, FontTemplates will be wrapped in Arc<RwLock<FontTemplate>> which will allow sharing templates across threads and lazily initializing data and desciptors from any place in the code. There will no longer need to be strong and weak references to font data, which are always strong currently in any case, because dropping the layout-specific FontStore should automatically clean up page-specific font data. In the future, it may make sense to cache FontStores for back and forward navigation.

Font

In general, Font does not change very much either. The biggest change here is that FontHandle will become both Sync and Send and FontRef will become Arc<RwLock<Font>> so it can be shared across threads. If any locking needs to happen on FontHandle due to platform threading issues, that will happen in the platform-specific FontHandle code. In general, platform font data structures can be used across threads 12.

Rough plan

  1. Move FontTemplateData directly into FontTemplate and wrap FontTemplate in a shared mutability and ownership data structure.
  1. Add a FontDataStore to enable font data sharing across threads.
  2. Move shared code from FontHandle into Font and rename FontHandle to PlatformFont.
  1. Make Font both Send and Sync and wrap them in Arc instead of Rc so they can be shared across threads and turn FontContext into a per-layout FontStore and wrap it in RwLock.
  1. Separate out the font lists into a separate GlobalFontStore data structure that can be accessed from multiple threads protected via RwLock.
  2. Make FontStore responsible for handling completed web font loads and storing templates for them.

Footnotes

  1. "All individual functions in Core Text are thread-safe. Font objects (CTFont, CTFontDescriptor, and associated objects) can be used simultaneously by multiple operations, work queues, or threads." from https://developer.apple.com/documentation/coretext/

  2. "[Since 2.5.6] In multi-threaded applications it is easiest to use one FT_Library object per thread. In case this is too cumbersome, a single FT_Library object across threads is possible also, as long as a mutex lock is used around FT_New_Face and FT_Done_Face." from https://freetype.org/freetype2/docs/reference/ft2-library_setup.html.

@mrobinson mrobinson added B-RFC A request for comments on a proposal C-untriaged New issues that haven't been triaged yet and removed C-untriaged New issues that haven't been triaged yet labels Apr 9, 2024
mrobinson added a commit to mrobinson/servo that referenced this issue Apr 10, 2024
The `FontContextHandle` was really only used on FreeType platforms to
store the `FT_Library` handle to use for creating faces. Each
`FontContext` and `FontCacheThread` would create its own
`FontContextHandle`. This change removes this data structure in favor of
a mutex-protected shared `FontContextHandle` for an entire Servo
process. The handle is initialized using a `OnceLock` to ensure that it
only happens once and also that it stays alive for the entire process
lifetime.

In addition to greatly simplifying the code, this will make it possible
for different threads to share platform-specific `FontHandle`s, avoiding
multiple allocations for a single font.

The only downside to all of this is that memory usage of FreeType fonts
isn't measured (though the mechanism is still there). This is because
the `FontCacheThread` currently doesn't do any memory measurement.
Eventually this *will* happen though, during the font system redesign.
In exchange, this should reduce the memory usage since there is only a
single FreeType library loaded into memory now.

This is part of servo#32033.
@gterzian
Copy link
Member

I think it's an interesting idea and I have some questions, mostly about how this would play out when running in multiprocess mode.

The current proposal does not define how it would be impacted by process boundaries. The current font cache is found alongside the constellation in the "main process" of Servo(what other engines refer to as the chrome process I think), whereas the users of the font cache would be partitioned by a BrowsingContextGroup as individual content-processes. This means we need to think both of what to share globally across Servo, by way of IPC and a central mechanism found in the main process, and what we can then hierarchically share with child content processes.

I can imagine a hierarchical structure, where we keep the core font mechanism inside the "main browser process", alongside the constellation, as well as the resource loading mechanism it uses, but perhaps supplement it with local per-content-process caches which would follow something alongside the proposed design and allow for easy sharing between threads within a single content-process, but I wonder how many threads would benefit from this, after the removal of individual threads for layout(with #31346).

Lastly, I wonder if we should not rather partition further the current font cache, for security or privacy, for example per , meaning we also need to think about when not to share data across boundaries, such as only on a "per BC group"(Something like this will have to eventually happen for the HTTP cache, see whatwg/fetch#904).

mrobinson added a commit to mrobinson/servo that referenced this issue Apr 12, 2024
The `FontContextHandle` was really only used on FreeType platforms to
store the `FT_Library` handle to use for creating faces. Each
`FontContext` and `FontCacheThread` would create its own
`FontContextHandle`. This change removes this data structure in favor of
a mutex-protected shared `FontContextHandle` for an entire Servo
process. The handle is initialized using a `OnceLock` to ensure that it
only happens once and also that it stays alive for the entire process
lifetime.

In addition to greatly simplifying the code, this will make it possible
for different threads to share platform-specific `FontHandle`s, avoiding
multiple allocations for a single font.

The only downside to all of this is that memory usage of FreeType fonts
isn't measured (though the mechanism is still there). This is because
the `FontCacheThread` currently doesn't do any memory measurement.
Eventually this *will* happen though, during the font system redesign.
In exchange, this should reduce the memory usage since there is only a
single FreeType library loaded into memory now.

This is part of servo#32033.
@mrobinson
Copy link
Member Author

@gterzian This is a great point. The other consideration for the multiprocess case is sandboxing. I think the solution here is to have the FontCacheThread in the UI-process responsible for loading font lists and also font data. Still this would be handled as described in the design above (with the exception that the multiprocess case would proxy these requests via IPC to the main process). Likely all fonts would need to be created using byte buffers as well. I think this could mostly be transparent to most of the code.

github-merge-queue bot pushed a commit that referenced this issue Apr 12, 2024
The `FontContextHandle` was really only used on FreeType platforms to
store the `FT_Library` handle to use for creating faces. Each
`FontContext` and `FontCacheThread` would create its own
`FontContextHandle`. This change removes this data structure in favor of
a mutex-protected shared `FontContextHandle` for an entire Servo
process. The handle is initialized using a `OnceLock` to ensure that it
only happens once and also that it stays alive for the entire process
lifetime.

In addition to greatly simplifying the code, this will make it possible
for different threads to share platform-specific `FontHandle`s, avoiding
multiple allocations for a single font.

The only downside to all of this is that memory usage of FreeType fonts
isn't measured (though the mechanism is still there). This is because
the `FontCacheThread` currently doesn't do any memory measurement.
Eventually this *will* happen though, during the font system redesign.
In exchange, this should reduce the memory usage since there is only a
single FreeType library loaded into memory now.

This is part of #32033.
@nicoburns
Copy link
Contributor

In general, Font does not change very much either. The biggest change here is that FontHandle will become both Sync and Send and FontRef will become Arc<RefCell<Font>>

Perhaps Arc<AtomicRefCell<Font>>, Arc<Mutex<Font>> or Arc<RwLock<Font>>? I don't think Arc<RefCell<_>> ever makes sense.

@mrobinson
Copy link
Member Author

Perhaps Arc<AtomicRefCell<Font>>, Arc<Mutex<Font>> or Arc<RwLock<Font>>? I don't think Arc<RefCell<_>> ever makes sense.

Yep, I meant Arc<RwLock<Font>> here. I've corrected the text above.

@gterzian
Copy link
Member

gterzian commented Apr 19, 2024

Couple of more questions, I have looked further into the existing system, so I hope my questions are now more specific:

I think the solution here is to have the FontCacheThread in the UI-process responsible for loading font lists and also font data

With "the UI-process" do you mean the content-process(where script and layout runs)? If so that would mean running multiple font threads, one per content-process, which I think would defeat the purpose of increasing sharing. I would rather keep one central FontCache in the chrome process, and do IPC with per-content-process caches for all local layouts(the proposed FontStore).

Edit: I've just become aware the FontCacheThread is just a wrapper to a channel to the FontCache, so whenever I wrote FontCacheThread I actually meant "the thing that runs in a thread", which is the FontCache.

Regarding the specific problems that the re-design is meant to address:

  • Web fonts are global. Is this not something we can address with the desired partioning at the level of the FontCache?
  • Font data is copied to each layout thread: Agreed on the need to share data between threads(the per-layout FontStore mentioned above), iff those threads would run in the same content-process. Also a question: since we don't have individual layout threads anymore, what threads are we talking about here? Would this per-layout FontStore not rather be some thread-local data structure shared by the various layouts?
  • Font data is loaded more than once for fonts that share files: can this be handled locally inside the current design? it appears to me like a question of switching the current pub data: Option<Arc<Vec<u8>>> into a pub data: Arc<HashMap<FontIdentifier, Option<Vec<u8>>>>, so that the optional data can be shared between multiple templates. It also appears to me this is not currently shared between threads, and per above comment I don't think it should, and so could be an Rc.
  • Font data for web fonts is never unloaded: can this be handled with a "unload" messaging flow that fits in the current design? Perhaps in addition to some partitioning as per the first problem. For example, what is described as the "per-layout FontStore" could, when dropping or otherwise become unnecessary, send a message to the global FontCache for potential clean-up of the web font.

Finally, is the proposed FontStore, but the way I think it could be(so probably not actually shared between non-existing layout threads) not just the current FontContext?

I do see a problem with the current FontContext, especially if we don't run layout in threads anymore since this means it would block script as well, and that is the blocking IPC(the initial one that would miss the cache in the context) with the FontCache. Is there a way we can make those asynchronous using the IPC router and some appropriate state machine approach? It seems to me that the font context is used synchronously in the layout algorithm, so I guess the answer is "not easily", but perhaps something could be done via the script-thread as part of document load to populate the context for layout? Or perhaps layout should run without fonts, or with some sort of default one that would always be available, while they are being loaded?

@mrobinson
Copy link
Member Author

With "the UI-process" do you mean the content-process(where script and layout runs)? If so that would mean running multiple font threads, one per content-process, which I think would defeat the purpose of increasing sharing. I would rather keep one central FontCache in the chrome process, and do IPC with per-content-process caches for all local layouts(the proposed FontStore).

The UI process is another name for the chrome process. "Chrome process" is Chrome/Chromium parlance while "UI process" is the term that WebKit uses. I think we ultimately need three tiers of data structure for fonts:

  • FontService: This lives in the UI process and would store font information global to all Servo processes. It would mainly hold system font lists and any data loaded for system fonts.
  • FontStore: This lives in the content process and holds cached system FontTemplates, Fonts, and cached font data which can be shared by all layouts.
  • LayoutFontStore: This lives in each layout and stores FontTemplates for web fonts. Web fonts templates and data can be shared between different LayoutFontStores and when the last reference is released they are automatically cleaned up.
* **Web fonts are global.** Is this not something we can address with the desired partioning at the level of the `FontCache`?

I sort of addressed this above.

* **Font data is copied to each layout thread**: Agreed on the need to share data between threads(the per-layout `FontStore` mentioned above), iff those threads would run in the same content-process. Also a question: since we don't have individual layout threads anymore, what threads are we talking about here? Would this per-layout `FontStore` not rather be some thread-local data structure shared by the various layouts?

We don't have a layout thread for each layout, but layout does create worker threads and currently there is a FontStore for every worker. This work is about eliminating this font template and data duplication across worker threads, which is likely very expensive. Currently every worker talks directly to the global font cache!

* **Font data is loaded more than once for fonts that share files**: can this be handled locally inside the current design? it appears to me like a question of switching the current [`pub data: Option<Arc<Vec<u8>>>`](https://github.com/servo/servo/blob/21ea6d21f0f3bcb2e736082e397ef99cf9ecd051/components/gfx/font_template.rs#L99) into a `pub data: Arc<HashMap<FontIdentifier, Option<Vec<u8>>>>`, so that the optional data can be shared between multiple templates. It also appears to me this is not currently shared between threads, and per above comment I don't think it should, and so could be an `Rc`.

It might be possible to handle it in the current design, but I think we should rethink things to unpack as much of our technical debt as possible. In my proposal this means a data structure that just concerns itself with caching font data. I think that will simplify a lot of things, because then all the other font data structures just have to care about holding Arc and the font data cache can clean up automatically using weak references.

* **Font data for web fonts is never unloaded**: can this be handled with a "unload" messaging flow that fits in the current design? Perhaps in addition to some partitioning as per the first problem. For example, what is described as the "per-layout FontStore" could, when dropping or otherwise become unnecessary, send a message to the global `FontCache` for potential clean-up of the web font.

I think this can be even simpler if we have a per-layout FontStore because cleaning up the LayoutFontStore automatically cleans up all web fonts. Eventually this will need to be better though, in order to handle font unloading for long-running single page applications that load and unload fonts from stylesheets.

Finally, is the proposed FontStore, but the way I think it could be(so probably not actually shared between non-existing layout threads) not just the current FontContext?

The big difference is that there is oneFontContext for every layout worker thread and LayoutFontStore is per-layout.

I do see a problem with the current FontContext, especially if we don't run layout in threads anymore since this means it would block script as well, and that is the blocking IPC(the initial one that would miss the cache in the context) with the FontCache. Is there a way we can make those asynchronous using the IPC router and some appropriate state machine approach? It seems to me that the font context is used synchronously in the layout algorithm, so I guess the answer is "not easily", but perhaps something could be done via the script-thread as part of document load to populate the context for layout? Or perhaps layout should run without fonts, or with some sort of default one that would always be available, while they are being loaded?

This is an interesting question. Regarding your point about blocking script -- this is already the case and even was when we had the layout thread, because script always blocked on the layout thread layout. There was never any parallelism between script and layout (apart from some really minor things). I think we can never do a layout until we have at the very least loaded the system font list and loaded fonts necessary for a layout (even if falling back from in-process web fonts). Doing a layout before this happens will lead to flashes of badly laid out pages. It's better to show nothing than do that, I think.

That said, once those fonts are loaded once, they should probably never be synchronous calls to use them ever again (unless the system fonts change while Servo is running). Maybe we can look into pre-populating the per-content-process FontStore when creating them. I think that would make a lot of sense actually -- though if the new page uses unloaded fonts they would obviously need to be loaded.

@gterzian
Copy link
Member

gterzian commented Apr 19, 2024

Thanks for clarifying, it makes mostly sense to me so far: the three-tier layered structure seems to me make sense, although the need for one 'LayoutFontStore' per layout and specific for web fonts is not completely clear yet, but I guess it will be. Is it because the lifecycle of web fonts is different and to facilitate their unloading?

@mrobinson
Copy link
Member Author

Thanks for clarifying, it makes mostly sense to me so far: the three-tier layered structure seems to me make sense, although the need for one 'LayoutFontStore' per layout and specific for web fonts is not completely clear yet, but I guess it will be. Is it because the lifecycle of web fonts is different and to facilitate their unloading?

The reason that we need a LayoutFontStore per-layout is that a stylesheet can specify a web font with a URL, family name, and font properties. Even though the URL can be the same as the web fonts on other pages, for the purposes of matching the family name and font properties only apply to the page where the web font was included in the stylesheet. So even though a web font is shared each "instance" of the font is specific to a layout for matching.

@mrobinson
Copy link
Member Author

Ah, there's another issue that I wanted to bring up related to sandboxing as well. It seems impossible to send system font data over IPC or shared memory on macOS, because of the way the CoreText APIs work. Basically system TTC files need to be loaded from disk by CoreText for it to work properly. Due to this, we'll need to continue loading system fonts using CoreText, poking a hole in the sandbox for them. This is what Chromium does and I suspect we'll need to do the same thing.

@nicoburns
Copy link
Contributor

I've looked into this a little more and I have some comments:

  • It would be good if the layout_2020 crate could be decoupled from the font store types. And especially from the font thread. Perhaps this could be abstracted behind a trait such that other people wishing to use the layout crate could provide their own font loading logic. Bundling shaping with layout seems reasonable, but the font loading seems like ought to be separable.

  • I had thought that Servo was using servo/font-kit for font loading, but I see that font-kit is only used for <canvas> and that this functionality is provided by components/gfx. In fact components/gfx is almost text and font related code. Do you know why this is?

  • The font loading and "store" code in gfx seems to have a lot of overlap with linebender's new fontique crate. I wonder if Servo would consider adopting that library? I feel like if we can get a unified "font database" abstraction across that the Rust ecosystem then that would be a really good basis for interoperability. My understanding is that most (in memory) "font databases" are primarily storing the raw bytes of the font file (+ some metadata), so it seems like it ought to be possible to standardise.

@mrobinson
Copy link
Member Author

mrobinson commented May 1, 2024

  • I had thought that Servo was using servo/font-kit for font loading, but I see that font-kit is only used for <canvas> and that this functionality is provided by components/gfx. In fact components/gfx is almost text and font related code. Do you know why this is?

font-kit was created well after the font backend and it was never ported. In the meantime, font-kit is mostly abandoned and is mainly just a port of the old font code form Servo. It's not clear if it's going to be complete enough for Servo going forward. There is a dependency on font-kit in canvas, because raqote uses it, but I would be happy to drop the dependency entirely. I think it makes sense to bring the font system up to date with modern fonts and modern Servo and then see what the situation looks like.

  • The font loading and "store" code in gfx seems to have a lot of overlap with linebender's new fontique crate. I wonder if Servo would consider adopting that library? I feel like if we can get a unified "font database" abstraction across that the Rust ecosystem then that would be a really good basis for interoperability. My understanding is that most (in memory) "font databases" are primarily storing the raw bytes of the font file (+ some metadata), so it seems like it ought to be possible to standardise.

fontique looks cool, but font enumeration is one of the simplest things that the font backend does right now. In addition, fallback is defined in the specification so we can't just do what the system requests. The enumeration code in fontique honestly looks a lot like ours (is it a port?) and lacks some of the features we have added in recent weeks. Generally speaking the requirements of a web browser go well beyond that of a normal application, so I do not have much faith in high-level libraries for fonts. We will be using things like skrifa very soon though.

Regarding font handles, Servo's requirements are very tricky due to sandboxing and IPC. I think there's no hope of using something like fontique for that.

@mrobinson
Copy link
Member Author

It would be good if the layout_2020 crate could be decoupled from the font store types. And especially from the font thread. Perhaps this could be abstracted behind a trait such that other people wishing to use the layout crate could provide their own font loading logic. Bundling shaping with layout seems reasonable, but the font loading seems like ought to be separable.

This would be nice to do, but fonts and web layout are very intertwined. The first priority should be that things work correctly. The details of the FontCache in Servo are already behind a FontSource trait and we do not plan on making anything less abstract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC A request for comments on a proposal
Projects
None yet
Development

No branches or pull requests

3 participants