Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TESTERS NEEDED AGAIN] SPU: PUTLLC16 Optimization, SPU Analyzer capabilities upgrade #15429

Merged
merged 6 commits into from May 21, 2024

Conversation

elad335
Copy link
Contributor

@elad335 elad335 commented Apr 11, 2024

For a while, I had a few complex SPU optimizations in mind. One of which was the "PUTLLC16" loop optimization (see #8703)
The concept itself was great, detect atomic loops in SPU code which only update 16 bytes of data at maximum in order to bridge between atomic operation capacity of X86 and ARM which 16 bytes between the CELLBE's SPU architecture's capacity which is a whopping 128 bytes.
So in theory, if we can analyse the code to detect when it is possible the atomic loop to update 16 bytes only (about a third of all SPU atomic loops in games are coded this way), the performance of that code would increase dramatically (especially on non-TSX CPUs for which the implementation is slower compared to TSX). But, as I started implementing analysis for detection of this pattern across a variaty of code from games, things started to entangle and many hacks were put in the original pull request in order to support as many code variations as possible for different code flows (mainly for single backward loops and single forward if inside tge atomic update). But, this is both hacky and less valueable than equiping the SPU analyzer with cross-block analysis, allowing more optimizations deriving from it in the future and detection of all possible 16-byte atomic loops cases,
But this was no simple task, as the underline algorighm was difficult as hell to resolve it took me a whole year to do it.
It was worth it though.

Please test performance of games, the difference would probably not be huge but noticeable in titles that have gaps betwseen TSX and non-TSX CPUs.

Significant performance improvements have been noted in Red Dead Redemption, Spider-Man Web Of Shadows, Metal Gear Solid 4 and Metal Gear Solid Online. Do note that changes are CPU subjective.

What to expect and test:

  • SPU usage differences.
  • Performance differences.
  • Game compatibility and stability breakage.

Example of a simple SPU atomic loop with only 16 bytes of the reservation modified (notice how both STQR and LQR address the same offset and no other store/load types are used):
image

@elad335 elad335 added Enhancement Optimization Optimizes existing code LLVM Related to LLVM instruction decoders labels Apr 11, 2024
rpcs3/Emu/System.cpp Outdated Show resolved Hide resolved
rpcs3/main.cpp Outdated Show resolved Hide resolved
@Megamouse
Copy link
Contributor

Also, please put the spu stuff in another PR than all the progress stuff.

@cipherxof
Copy link
Contributor

cipherxof commented Apr 11, 2024

Slightly worse performance for me in MGS4.

12700K @ Stock /w AVX-512

PR

Screenshot from 2024-04-11 14-28-59

Screenshot from 2024-04-11 14-21-45

Master

Screenshot from 2024-04-11 14-24-44

Screenshot from 2024-04-11 14-23-42

@Nishikoi
Copy link

No real discernible difference in MGS4

9900K @ 4.80Ghz

PR /w TSX Enabled

image

image

PR /w TSX Disabled

image

image

Master /w TSX Enabled

image

image

Master /w TSX Disabled

image

image

@elad335 elad335 force-pushed the analyser branch 6 times, most recently from b3b71fc to b72bfb1 Compare April 13, 2024 19:49
fs::file to_close_file;
{
auto reset = init_mtx->reset();
to_close_file = std::move(file.file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to reset file.file after move

@elad335 elad335 force-pushed the analyser branch 3 times, most recently from dd707ec to 9041e98 Compare April 19, 2024 15:55
@elad335 elad335 changed the title SPU: PUTLLC16 Optimization, SPU Analyzer capabilities upgrade [TESTERS NEEDED] SPU: PUTLLC16 Optimization, SPU Analyzer capabilities upgrade Apr 19, 2024
@elad335
Copy link
Contributor Author

elad335 commented Apr 19, 2024

Can someone retest this? I pushed many changes.

@A5362
Copy link

A5362 commented Apr 19, 2024

I could give it a try, do you recommend some games ?

@cipherxof
Copy link
Contributor

Metal Gear Online hangs on building the SPU cache. I'm also forced to close RPCS3 via the task manager.

Log contains a bunch of these:

F {SPU Worker 7} SIG: Thread terminated due to fatal error: Verification failed
(in file F:\rpcs3\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:6968[:7], in function evaluate_start_state)

as well as

S SPU: PUTLLC16 Pattern Detected! (put_pc=0x6814, is_pc_rel=0, offset=0x0, is_const=0, Gd63vsaR9xJkYQH5C22uKCF6tXbR) (putllc0=0, putllc16+0=37, all=38)

RPCS3.zip

@elad335 elad335 force-pushed the analyser branch 2 times, most recently from 112b952 to b3e3dc0 Compare April 19, 2024 20:21
@Jonathan44062
Copy link

GOW Ascension crashes in the opening scene of the game when you start and see the first fury
RPCS3.log

On master works fine

@cipherxof
Copy link
Contributor

cipherxof commented May 8, 2024

MGSO/MGS4 perf, other games and compatibility too

Latest update on Windows seems to be about identical in terms of performance to my previous tests.

I tested using Giga SPU block size.

@Nishikoi
Copy link

Nishikoi commented May 9, 2024

MGS4
9900K @ 4.80Ghz
SPU Block Size: Giga

0.0.31 PR

image

0.0.32 PR

image

Master

image

@elad335
Copy link
Contributor Author

elad335 commented May 10, 2024

Dont use SPU Block size Giga when testing at the moment.

@A5362
Copy link

A5362 commented May 15, 2024

Do you want some test ?

@elad335
Copy link
Contributor Author

elad335 commented May 15, 2024

Sure.

@A5362
Copy link

A5362 commented May 15, 2024

On uncharted, the game crash after I make a new game in the menu
RPCS3.log

@FlexBy420
Copy link

On uncharted, the game crash after I make a new game in the menu
RPCS3.log

Relaxed zcull causes the game to crash, overall you are using pretty weird settings so set them back to default and only use the wiki ones

@A5362
Copy link

A5362 commented May 15, 2024

On uncharted, the game crash after I make a new game in the menu
RPCS3.log

Relaxed zcull causes the game to crash, overall you are using pretty weird settings so set them back to default and only use the wiki ones

It's what I did, it's option on the wiki and I didn't change anything else
The audio also didn't work

@FlexBy420
Copy link

Relaxed zcull and accurate xfloat are not listed on game wiki page lol

@A5362
Copy link

A5362 commented May 15, 2024

Relaxed zcull and accurate xfloat are not listed on game wiki page lol

It's default option I have when I load it

I give a retry, I delete the rpcs3 folder so I restart from zero, and now the game run, but the audio still doesn't work, Zcull has nothing to do with it.
So my older log is still valid.

@elad335
Copy link
Contributor Author

elad335 commented May 21, 2024

Any other issues?

@elad335
Copy link
Contributor Author

elad335 commented May 21, 2024

YOLO

@Augusto7743
Copy link

Tested with 2 games in an FX-6300 6 core using default settings with accurate rsx reservations.
Spelunker HD > before use more than 65 % total cpu usage ... now with that build is 50 % cpu total usage.
NFS Hot Pursuit > that game not run in an FX-6300, but before teh frame rate was 9 FPS and with that build is 14~16 FPS gameplay !
Possibly has others games showing better performance with that build.
Thanks again =)

@elad335
Copy link
Contributor Author

elad335 commented May 23, 2024

Need to note that from now on PUTLLC16 optimization is disabled with RSX reservations.

@grester
Copy link

grester commented May 25, 2024

It's been a few days since I lasted played GT5, and I must disclose I use the GT5 online mod, nevertheless, I had put quite a few hours without a problem, but one of these latest updates absolutely broke it.
Game freezes as soon as I reach main menu. Thought was a corrupted save game, did a fresh start, reinstalled all updates once again, reapplied the mod, as soon as I bought the first car, game crashed again.
Reverted RPCS3 back to #15611 and the old modded save game works back again. I also use unlocked FPS game patch.
Game version is 02.11
The kind of errors I was getting were:
E sys_fs: 'sys_fs_stat' failed with 0x80010006 : CELL_ENOENT, “/dev_hdd0/game/BCUS98114/USRDIR/PDIPFS/9/41/SR” [1]

RPCS3 config:
PPU: LLVM
SPU: ASMJIT
Vulkan
GPU without additional settings

PC SPECS:
i5-14600KF
RX 7800 XT (with latest stable Anti-Lag2 drivers but nothing GPU drivers enabled)

I also tested for a couple minutes Skate3 but it had no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement LLVM Related to LLVM instruction decoders Optimization Optimizes existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet