Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU: Migrate buffers on GPU project, pre-emptively flush device local mappings #6794

Merged
merged 15 commits into from May 19, 2024

Conversation

riperiperi
Copy link
Member

@riperiperi riperiperi commented May 10, 2024

Essentially retreading #4540, but it's on the GPU project now instead of the backend. This allows us to have a lot more control + knowledge of where the buffer backing has been changed and allows us to pre-emptively flush pages to host memory for quicker readback. It will allow us to do other stuff in the future, but we'll get there when we get there.

Similar to #4540 , this should only affect performance on dedicated GPUs, not integrated or mobile. There's a lot of work done here that could help separate host imported buffers from copied ones in future, but that's not really important right now. The target is mainly NVIDIA, I don't know how this will affect AMD (hopefully not negatively).

Extra

  • Rewrites a bunch of buffer migration stuff. Might want to tighten up how dispose stuff works.
  • Fixed an issue where the copy for texture pre-flush would happen after the syncpoint.
    • May fix an issue where character lighting was sometimes broken on Splatoon 3.

TODO stuff

  • Do a lot of testing (i've only tested on nvidia desktop)
  • Less permanent effect for storage on fragment. Right now it forces the buffer to be device local, but it's possible that may change in the future. I'd prefer if it just had a very potent effect for keeping it device local or device local mapped.
    - Remove the "Auto" mode from Vulkan, as it isn't used anymore.

… mappings

Essentially retreading Ryujinx#4540, but it's on the GPU project now instead of the backend. This allows us to have a lot more control + knowledge of where the buffer backing has been changed and allows us to pre-emptively flush pages to host memory for quicker readback. It will allow us to do other stuff in the future, but we'll get there when we get there.

Performance greatly improved in Hyrule Warriors: Age of Calamity. Performance notably improved in TOTK (average). Performance for BOTW restored to how it was before Ryujinx#4911, perhaps a bit better.

- Rewrites a bunch of buffer migration stuff. Might want to tighten up how dispose stuff works.
- Fixed an issue where the copy for texture pre-flush would happen _after_ the syncpoint.

TODO: remove a page from pre-flush if it isn't flushed after a certain number of copies.
@riperiperi riperiperi added gpu Related to Ryujinx.Graphics performance Performance issue or improvement labels May 10, 2024
@github-actions github-actions bot added graphics-backend:vulkan Graphical bugs when using the Vulkan API graphics-backend:opengl Graphical bugs when using the OpenGL API labels May 10, 2024
@gdkchan
Copy link
Member

gdkchan commented May 11, 2024

Just for the record, this is my test on Zelda Tears of the Kingdom (on Ryzen 9 7900X and RTX 3060):
master:
image
image
PR:
image
image
(Around +30 fps on title screen, +10 ingame).

@gdkchan
Copy link
Member

gdkchan commented May 11, 2024

Splatoon 3 shadow issue seems to be fixed.
master:
image
PR:
image

@GamerzHell9137
Copy link

Tests done on the Ryzen 3600.

Monster Hunter Rise - Master 27 FPS / PR 30 FPS

Note

Cleans up the remaining fps drops in more stressed areas of the game.

image
image

Catherine - Master 72 FPS / PR 92

Note

FPS in general is higher but the more important thing is that its increasing the FPS lows from under 30 to over 30 FPS making it speed wise playable.

image
image

@Littl3Guy
Copy link

Slight performance improvement in Pokemon Scarlet, totk is still performing badly for different reasons (RDNA2) so no improvement there.
Master:
image
PR:
Untitled

I didn't notice any regressions in other games.

@lostromb
Copy link
Contributor

Testing in amd 5500XT - Windows - Vulkan.
FPS are about the same in Hyrule Warriors, Pokemon Scarlet, and Pikmin 4. Nothing remarkable there.

There may be a minor regression in Star Ocean 2

Branch FPS Mean Median p99% Frametime StdDev ms Stutter %
migration-rewrite 52.67 53.49 41.31 1.48 0.00
master 56.06 56.49 48.61 0.75 0.00

@riperiperi riperiperi marked this pull request as ready for review May 17, 2024 23:16
@ryujinx-mako ryujinx-mako bot requested a review from a team May 17, 2024 23:16
src/Ryujinx.Graphics.Gpu/Engine/MME/MacroHLE.cs Outdated Show resolved Hide resolved
src/Ryujinx.Graphics.Gpu/Memory/Buffer.cs Show resolved Hide resolved
src/Ryujinx.Graphics.Gpu/Memory/Buffer.cs Outdated Show resolved Hide resolved
src/Ryujinx.Graphics.Gpu/Memory/Buffer.cs Outdated Show resolved Hide resolved
src/Ryujinx.Graphics.Gpu/Memory/BufferBackingState.cs Outdated Show resolved Hide resolved
StorageWrite = 0x80,

#pragma warning disable CA1069 // Enums values should not be duplicated
StorageAtomic = 0xc0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the compiler stop complaining if you set it to:

StorageAtomic = StorageRead | StorageWrite

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it still doesn't like that. These aren't really like flags anyways, it's more like a sequential enum shifted up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I just noticed that in other places, we use this attribute:

[SuppressMessage("Design", "CA1069: Enums values should not be duplicated")]

src/Ryujinx.Graphics.Gpu/Memory/BufferStage.cs Outdated Show resolved Hide resolved
src/Ryujinx.Graphics.Vulkan/BufferHolder.cs Outdated Show resolved Hide resolved
else
{
memoryType = Vendor == Vendor.Nvidia ?
SystemMemoryType.DedicatedMemorySlowStorage :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why only NVIDIA has "slow storage".

@ryujinx-mako ryujinx-mako bot requested a review from a team May 18, 2024 12:40
private int _flushCount;
private int _flushTemp;
private int _lastFlushWrite = -1;
private readonly BufferAllocationType _baseType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed too since its unused. Or is there a reason for keeping it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing right now I think, but it will still contain Auto when a buffer is auto selected as HostMapped, so it could be useful for debug.

Copy link
Member

@gdkchan gdkchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks! I tested a few games here and found no issues. It's nice to see buffer migration reworked, and the backend getting simplified too.

@gdkchan gdkchan merged commit eb1ce41 into Ryujinx:master May 19, 2024
12 checks passed
@lostromb
Copy link
Contributor

Final perf numbers, tested in Windows 11 AMD 5500XT and Nvidia Titan X
Astral Chain seems to improve on AMD. TOTK seems to improve on Nvidia.
No other clear movers in this set of games. Testing was limited because my monitor caps at 60hz

Game Branch FPS Mean Median p99% Frametime StdDev ms Stutter %
Tears of the Kingdom AMD Master 41.35 41.99 34.42 1.46 0.00
Tears of the Kingdom AMD PR 39.82 40.45 31.22 2.11 0.00
Tears of the Kingdom Nvidia Master 46.50 48.57 29.82 2.67 0.00
Tears of the Kingdom Nvidia PR 50.32 52.26 38.57 1.99 0.00
Astral Chain AMD Master 56.67 57.06 45.22 1.42 0.00
Astral Chain AMD PR 59.93 61.30 41.83 2.49 0.00
Star Ocean 2 AMD Master 52.71 53.62 41.66 1.41 0.00
Star Ocean 2 AMD PR 53.49 55.48 39.22 1.89 0.00
Star Ocean 2 Nvidia Master 78.26 78.96 65.97 0.71 0.05
Star Ocean 2 Nvidia PR 77.84 78.60 65.59 0.59 0.00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Related to Ryujinx.Graphics graphics-backend:opengl Graphical bugs when using the OpenGL API graphics-backend:vulkan Graphical bugs when using the Vulkan API performance Performance issue or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants