Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsx: Implement xform-constant-based instancing #15483

Merged
merged 19 commits into from May 12, 2024
Merged

Conversation

kd-11
Copy link
Contributor

@kd-11 kd-11 commented Apr 19, 2024

We had other forms of instancing previously, but that was only done for index-base and vertex-base modifying setups. Some insomniac titles are using full xform-based instancing (load-matrix -> draw -> load-matrix -> draw) form of instancing which is not supported and is very performance intensive. This PR addresses that.

Fixes #10754

TODO

  • Fix vulkan crashes on AMD due to vkCmdUpdateBuffer abuse.
  • Optimizations.
    • Batch the updates on the barrier insertion side.
    • Directly submit ranges to the backend to update. This avoids the indexed register penalty.
  • Implement the feature on OpenGL.
  • Investigate unusual coloration on some mesh instances (broken attributes??) [NOPE, lighting is borked on real PS3 too)
  • Investigate and fix performance drop on Vulkan

@Linear524

This comment was marked as off-topic.

@kd-11 kd-11 marked this pull request as ready for review April 23, 2024 02:09
@kd-11
Copy link
Contributor Author

kd-11 commented Apr 23, 2024

For some reason OpenGL is now almost twice as fast in scenes with heavy foliage. I need to optimize a little more by just going for a rebind when we can't do a hot-patch. I guess the same optimization can also be used for OpenGL.

@kd-11 kd-11 marked this pull request as draft April 23, 2024 23:18
@kd-11
Copy link
Contributor Author

kd-11 commented Apr 23, 2024

There's a lot more to optimize after looking at profile traces.

rpcs3/Emu/RSX/Program/program_util.h Outdated Show resolved Hide resolved
rpcs3/Emu/RSX/Program/program_util.h Outdated Show resolved Hide resolved
@kd-11
Copy link
Contributor Author

kd-11 commented Apr 28, 2024

Vulkan performance is fixed on AMD cards, but NVIDIA is struggling with a 60% performance loss. We can work around that by restructuring how vertex programs work which I'll have to do in parallel before this is merged.

@kd-11 kd-11 marked this pull request as draft April 28, 2024 18:38
@kd-11 kd-11 marked this pull request as ready for review May 10, 2024 23:47
@kd-11
Copy link
Contributor Author

kd-11 commented May 10, 2024

I'm going to merge this one as-is. The NVIDIA-specific optimization work will come later and will never be as fast as the approach taken here. I will try updating drivers and checking why the barriers are stalling so badly with Nsight and maybe reporting the issue to their driver team. Not much hope for quick action there though.

@kd-11
Copy link
Contributor Author

kd-11 commented May 10, 2024

I wonder if the spec violation is what is causing the performance to tank (buffer update in the middle of a renderpass). Maybe VK_KHR_dynamic_rendering can help out in that case. But every other "smart" trick I can think of including uploading a whole indirection table would require some kind of transfer op to patch the on-device lookup table which leaves us back at square 1.

EDIT: Turns out I'm too smart for my own good XD. There is no spec violation. Inserting a barrier automatically splits the renderpass. This is why NVIDIA is so slow, because we have 6500 renderpasses per frame instead of a few dozen.

@kd-11 kd-11 marked this pull request as draft May 11, 2024 00:25
@kd-11
Copy link
Contributor Author

kd-11 commented May 11, 2024

I need to think a bit more about this one. The only "fix" for NVIDIA here is to upgrade to VK1.3.

@kd-11 kd-11 marked this pull request as ready for review May 11, 2024 01:49
@kd-11 kd-11 merged commit fc92aef into RPCS3:master May 12, 2024
6 checks passed
@JimScript
Copy link

I didn't think the fix for my issue would require work to be this...extensive, but thanks a ton kd-11!

@segevfiner
Copy link

segevfiner commented May 12, 2024

@kd-11 Awesome work! But is there a still a performance issue for NVIDIA or did you manage to find a workaround/fix? If not, I guess someone should open an issue to track that?

@homembarata
Copy link

Can confirm that the instanced foliage in Resistance is now showing up as it should. Great work, thanks a lot kd-11.

@kd-11
Copy link
Contributor Author

kd-11 commented May 13, 2024

@kd-11 Awesome work! But is there a still a performance issue for NVIDIA or did you manage to find a workaround/fix? If not, I guess someone should open an issue to track that?

There is a workaround in place but performance will be worse on NV no matter what we do. There are over 6000 instances in some scenes and NV drivers don't like the inline updates very much.
We first need to get some real numbers on proper gaming GPUs and check if performance is unreasonable. If it's really bad then we can track it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ratchet & Clank: Tools of Destruction/Resistance: Fall of Man instanced mesh system broken
6 participants