Improving bump estimation accuracy and performance #541

armansito · 2024-04-03T05:28:56Z

The bump buffer size estimation utility is currently integrated into the scene construction interface of the scene API. In this model, each Scene object owns a BumpEstimator that maintains intermediate tallies of encoded data for a single instance of Encoding. When a Scene gets appended to another Scene, the bump tallies of the argument scene fragment also get appended to the tallies of the appendee, after a heuristic-driven scale is applied to it based on the provided transform.

This design has some significant drawbacks:

The size of the render target is not known during scene construction. The segment and tile buffer counts are currently likely to be massively overestimated when objects are scaled up to have significant portions lie outside the viewport. The estimator should take this into account to discard culled objects. While the current estimate for the "line soup" buffer is independent of the render target, there are good reasons to apply viewport clipping and culling during the curve flattening stage. If we implement such a culling scheme, then the estimator should take the viewport size into account for the line soup estimate too.
The heuristic based scaling is less accurate compared to relying on precisely transformed coordinates.
Glyphs and other shapes/resources that get resolved late are currently ignored as estimating them at encoding time is tricky.

The most straightforward solution to this is to run the estimation during scene resource resolution (see vello_encoding::Resolver::resolve()). Resolution happens every time a scene gets rendered at which point all fragment data, precise absolute transforms, late bound resources, and render target parameters (most importantly the viewport dimensions) are available.

This approach isn't without it's own drawbacks. The BumpEstimator has a modest but non-zero cost on CPU-time performance. Measurements on the current integration show a 1.75x - 2.3x increase to encoding time when the bump_estimate feature is enabled. This impact can be significant for complex scenes.

This performance impact is likely hard to avoid (at least without optimizations) for dynamic scenes, however there can be a significant advantage to avoiding this computation on every frame for a static scene (such as a SVG scene where the user only interacts with the transform) or reused scene fragments. Given the trade-offs, I have the following thoughts / proposals:

Move the estimation to resolve time so that the overestimation drawbacks above can be avoided. This can allow the BumpEstimate results to be more seamlessly integrated with the Layout and RenderConfig structures that are used to set up a vello render.
Make estimation optional so that a client can choose to reuse a prior estimate on a scene (fragment) that remains unmodified, with an optional transform that we can apply using today's heuristics. This is straightforward when the fragment doesn't undergo a transform. If the transform consists only of translations, most of the estimate remains the same unless the cull state changes (e.g. an object that was culled in the original estimate is brought back into view). Scales and rotations are best handled with a heuristic and they are also subject to the same culling limitations as translations.

Supporting optional reuse gives the client some flexibility to avoid estimation but the trade-offs need to be documented clearly.
It may be possible to reduce the estimation overhead with CPU-side optimizations. The estimates are per path segment which can be processed independently of each other. There are opportunities to achieve some parallelism using SIMD and multithreading.

There may be other considerations. For example, if we move forward and integrate estimation to resolve time, some of the computations (such as bounding boxes) could be used to apply culling on the CPU ahead of the GPU dispatches. This could have the potential to reduce the overall memory requirements (especially at input assembly) in high-zoom cases.

The text was updated successfully, but these errors were encountered:

* Impl `From<Encoding>` for `Scene`. This allows creating a `Scene` with a pre-existing `Encoding`. Fixes #530. * Link to #541 --------- Co-authored-by: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>

This was referenced Apr 3, 2024

New test scene: hugepath #542

Open

Strategy for robust dynamic memory, readback, and async #366

Open

DJMcNab mentioned this issue Apr 4, 2024

Impl From<Encoding> for Scene. #538

Merged

DJMcNab added a commit to waywardmonkeys/vello that referenced this issue Apr 23, 2024

Link to linebender#541

70e03db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving bump estimation accuracy and performance #541

Improving bump estimation accuracy and performance #541

armansito commented Apr 3, 2024 •

edited

Improving bump estimation accuracy and performance #541

Improving bump estimation accuracy and performance #541

Comments

armansito commented Apr 3, 2024 • edited

armansito commented Apr 3, 2024 •

edited