New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#12502 Remove limit on RenderLayers. #13317
#12502 Remove limit on RenderLayers. #13317
Conversation
@james7132 I think I know how to use tracy, would you mind pointing me in the direction of what I should benchmark or any documentation for this relative to rendering prs? |
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
{ | ||
let gpu_light = &mut gpu_lights.directional_lights[light_index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but gpu_light is only used inside the if block, right?
|
||
// Check if the light intersects with the view. | ||
if !view_layers.intersects(&light.render_layers) { | ||
gpu_light.skip = 1u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this makes it so that if the render layers of the view don’t intersect the render layers of the light then the light is disabled in the view. Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming performance is ok, and I don’t see why it wouldn’t be given that the shader code is doing one fewer operation than it would have been, then this is not controversial. So I’m just going to say it’s not controversial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate a bit more perf investigation but I won't block on it.
I'll raise an objection on this basis. A SmallVec is still going to be 6x larger as a component and involves other base costs (i.e. the capacity check on every access). This is quite a lot to be adding to where we were previously just fetching and doing bitwise operations on a u32. This is also expected to be on the majority of entities that should be rendered, so these costs can easily scale with scene size. My biggest concern is that once there's enough layers to spill onto the heap, the performance of visibility calculation will tank as the cache hit rate drops.
Expanding the fixed size to u64 or even u128 is likely less controversial, and are likely already more than sufficient given that Unity/Unreal already ship production games with a hard 32 render layer maximum. A u128 is enough to give a Fortnite-like game of 100+ players their own unique view and still render a shared world state, though I personally would not implement it that way, and it would still use less memory/fewer instructions to process than a SmallVec
.
I'd definitely suggest checking this against many_cubes
, potentially with a variety of layers settings used.
#[reflect(Component, Default, PartialEq)] | ||
pub struct RenderLayers(LayerMask); | ||
pub struct RenderLayers(SmallVec<[u64; 1]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub struct RenderLayers(SmallVec<[u64; 1]>); | |
pub struct RenderLayers(SmallVec<[u64; 2]>); |
With the union
feature on, we can fit 2 u64s on the stack before it starts to grow bigger than a base Vec
(on 64-bit platforms), alongside the one usize for the capacity check. This should support at least 128 layers before it spills onto the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a good call out on this feature.
I definitely appreciate concerns about performance and do not want to regress the base case here. I'm happy to do some more testing as best I can.
I'm not sure I understand this comment. The baseline I'd expect is that neither the view nor the entity has a render layer, in which case we're just checking that the default equals the default. In fact, because we're providing a
That's a fair concern that I would also like to understand better, but it's also worth pointing out that the status quo is that the user simply cannot go beyond
In my use case I need to spawn a camera for every node in a node graph which is necessarily unbounded. There are absolutely other ways that I could work around the limit, but they are very complex. I recognize this is more niche and so don't want to ram through changes solely on my behalf, but it's a pretty hard blocker for me personally. I'll try to collect some more evidence with regards to the impact here. |
Tested my M2 mbp. For |
i see a 2.5% regression in check_visibility without adding any layers, from 428ns to 440ns. adding i don't think 12 nanoseconds (or 7 nanoseconds with the fast path) per frame for one thread for 160k entities is significant, but i don't write bullet hell games ... if this is a concern i guess we'd have to use a feature flag, which would be messy. separately, running this myself i was getting a panic in prepare_lights where you replaced |
Can you check how much of an impact spilling onto the heap would have in the worst case? |
Thanks for validating this, I've added it to
Barring some stuff around the edges regarding constness, we could do an ugly feature flag for this.
Added a guard for this. Curious why I didn't hit it on my Mac.
|
I ran Will check its fixed shortly |
works for me too now. |
Don't want to claim to be a perf expert, but the numbers seem pretty reasonable to me? I'm sure there are scenarios we haven't covered yet. If the performance hit is acceptable, I don't think the additional maintence burden is worth it, but I will just say that I was able to implement a feature flag like so that "just works": #[cfg(feature = "unlimited_render_layers")]
#[path = "unlimited_render_layers.rs"]
mod render_layers;
#[cfg(not(feature = "unlimited_render_layers"))]
mod render_layers; |
I don't think feature flagging this is worth it. That's a nasty bit of maintenance burden, and IMO the flexibility here is worth the performance cost. |
I know this is pretty far into development but have we considered a tagged pointer to a slice with it's size stored inline? The render layer struct would be a single pointer whose address would define what it contains:
A intersection would be performed as follows:
If most our users only use the lower render layers or all render layers intersections and clones would operate without allocating. Pointer provenance shouldn't be an issue since we only ever dereference pointers whose addresses we haven't modified in any way. |
The powers that SME have decided that we should merge this as is, and tackle perf follow-ups in future PRs if they become meaningful (or someone wants to get nerd-sniped). Merging! |
# Objective - in example `render_to_texture`, #13317 changed the comment on the existing light saying lights don't work on multiple layers, then add a light on multiple layers explaining that it will work. it's confusing ## Solution - Keep the original light, with the updated comment ## Testing - Run example `render_to_texture`, lighting is correct
Objective
Remove the limit of
RenderLayer
by using a growable mask usingSmallVec
.Changes adopted from @UkoeHB's initial PR here #12502 that contained additional changes related to propagating render layers.
Changes
Solution
The main thing needed to unblock this is removing
RenderLayers
from our shader code. This primarily affectsDirectionalLight
. We are now computing askip
field on the CPU that is then used to skip the light in the shader.Testing
Checked a variety of examples and did a quick benchmark on
many_cubes
. There were some existing problems identified during the development of the original pr (see: https://discord.com/channels/691052431525675048/1220477928605749340/1221190112939872347). This PR shouldn't change any existing behavior besides removing the layer limit (sans the comment in migration aboutall
layers no longer being possible).Changelog
Removed the limit on
RenderLayers
by using a growable bitset that only allocates when layers greater than 64 are used.Migration Guide
RenderLayers::all()
no longer exists. Entities expecting to be visible on all layers, e.g. lights, should compute the active layers that are in use.