New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rsx: Implement xform-constant-based instancing #15483
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
For some reason OpenGL is now almost twice as fast in scenes with heavy foliage. I need to optimize a little more by just going for a rebind when we can't do a hot-patch. I guess the same optimization can also be used for OpenGL. |
There's a lot more to optimize after looking at profile traces. |
Vulkan performance is fixed on AMD cards, but NVIDIA is struggling with a 60% performance loss. We can work around that by restructuring how vertex programs work which I'll have to do in parallel before this is merged. |
I'm going to merge this one as-is. The NVIDIA-specific optimization work will come later and will never be as fast as the approach taken here. I will try updating drivers and checking why the barriers are stalling so badly with Nsight and maybe reporting the issue to their driver team. Not much hope for quick action there though. |
I wonder if the spec violation is what is causing the performance to tank (buffer update in the middle of a renderpass). Maybe VK_KHR_dynamic_rendering can help out in that case. But every other "smart" trick I can think of including uploading a whole indirection table would require some kind of transfer op to patch the on-device lookup table which leaves us back at square 1. EDIT: Turns out I'm too smart for my own good XD. There is no spec violation. Inserting a barrier automatically splits the renderpass. This is why NVIDIA is so slow, because we have 6500 renderpasses per frame instead of a few dozen. |
I need to think a bit more about this one. The only "fix" for NVIDIA here is to upgrade to VK1.3. |
…he size beforehand.
…ction for constant IDs
- This is allowed by spec when we don't care about what happens outside the renderpass
I didn't think the fix for my issue would require work to be this...extensive, but thanks a ton kd-11! |
@kd-11 Awesome work! But is there a still a performance issue for NVIDIA or did you manage to find a workaround/fix? If not, I guess someone should open an issue to track that? |
Can confirm that the instanced foliage in Resistance is now showing up as it should. Great work, thanks a lot kd-11. |
There is a workaround in place but performance will be worse on NV no matter what we do. There are over 6000 instances in some scenes and NV drivers don't like the inline updates very much. |
We had other forms of instancing previously, but that was only done for index-base and vertex-base modifying setups. Some insomniac titles are using full xform-based instancing (load-matrix -> draw -> load-matrix -> draw) form of instancing which is not supported and is very performance intensive. This PR addresses that.
Fixes #10754
TODO