Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving bump estimation accuracy and performance #541

Open
armansito opened this issue Apr 3, 2024 · 0 comments
Open

Improving bump estimation accuracy and performance #541

armansito opened this issue Apr 3, 2024 · 0 comments

Comments

@armansito
Copy link
Collaborator

armansito commented Apr 3, 2024

The bump buffer size estimation utility is currently integrated into the scene construction interface of the scene API. In this model, each Scene object owns a BumpEstimator that maintains intermediate tallies of encoded data for a single instance of Encoding. When a Scene gets appended to another Scene, the bump tallies of the argument scene fragment also get appended to the tallies of the appendee, after a heuristic-driven scale is applied to it based on the provided transform.

This design has some significant drawbacks:

  • The size of the render target is not known during scene construction. The segment and tile buffer counts are currently likely to be massively overestimated when objects are scaled up to have significant portions lie outside the viewport. The estimator should take this into account to discard culled objects. While the current estimate for the "line soup" buffer is independent of the render target, there are good reasons to apply viewport clipping and culling during the curve flattening stage. If we implement such a culling scheme, then the estimator should take the viewport size into account for the line soup estimate too.
  • The heuristic based scaling is less accurate compared to relying on precisely transformed coordinates.
  • Glyphs and other shapes/resources that get resolved late are currently ignored as estimating them at encoding time is tricky.

The most straightforward solution to this is to run the estimation during scene resource resolution (see vello_encoding::Resolver::resolve()). Resolution happens every time a scene gets rendered at which point all fragment data, precise absolute transforms, late bound resources, and render target parameters (most importantly the viewport dimensions) are available.

This approach isn't without it's own drawbacks. The BumpEstimator has a modest but non-zero cost on CPU-time performance. Measurements on the current integration show a 1.75x - 2.3x increase to encoding time when the bump_estimate feature is enabled. This impact can be significant for complex scenes.

This performance impact is likely hard to avoid (at least without optimizations) for dynamic scenes, however there can be a significant advantage to avoiding this computation on every frame for a static scene (such as a SVG scene where the user only interacts with the transform) or reused scene fragments. Given the trade-offs, I have the following thoughts / proposals:

  1. Move the estimation to resolve time so that the overestimation drawbacks above can be avoided. This can allow the BumpEstimate results to be more seamlessly integrated with the Layout and RenderConfig structures that are used to set up a vello render.

  2. Make estimation optional so that a client can choose to reuse a prior estimate on a scene (fragment) that remains unmodified, with an optional transform that we can apply using today's heuristics. This is straightforward when the fragment doesn't undergo a transform. If the transform consists only of translations, most of the estimate remains the same unless the cull state changes (e.g. an object that was culled in the original estimate is brought back into view). Scales and rotations are best handled with a heuristic and they are also subject to the same culling limitations as translations.

    Supporting optional reuse gives the client some flexibility to avoid estimation but the trade-offs need to be documented clearly.

  3. It may be possible to reduce the estimation overhead with CPU-side optimizations. The estimates are per path segment which can be processed independently of each other. There are opportunities to achieve some parallelism using SIMD and multithreading.

There may be other considerations. For example, if we move forward and integrate estimation to resolve time, some of the computations (such as bounding boxes) could be used to apply culling on the CPU ahead of the GPU dispatches. This could have the potential to reduce the overall memory requirements (especially at input assembly) in high-zoom cases.

DJMcNab added a commit to waywardmonkeys/vello that referenced this issue Apr 23, 2024
github-merge-queue bot pushed a commit that referenced this issue Apr 23, 2024
* Impl `From<Encoding>` for `Scene`.

This allows creating a `Scene` with a pre-existing `Encoding`.

Fixes #530.

* Link to #541

---------

Co-authored-by: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant