Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached Interpreter 2.0 #12723

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mitaclaw
Copy link
Contributor

It now supports variable-sized data payloads and memory range freeing. It's a little faster, too.

@mitaclaw
Copy link
Contributor Author

This PR conflicts with #12714, and I would prefer if this PR were merged first. The virtual member functions added to JitBase for the JITWidget refresh are incompatible with this redesign. I already have updated functions in a separate branch that I would rather apply to the JITWidget refresh PR than this PR.

@mitaclaw mitaclaw force-pushed the cached-interpreter-2.0 branch 3 times, most recently from aeda4f4 to 508f2e7 Compare April 21, 2024 16:01
@mitaclaw mitaclaw force-pushed the cached-interpreter-2.0 branch 9 times, most recently from 910a7b4 to 4a3a1e9 Compare April 23, 2024 16:42
@Simonx22
Copy link
Member

Simonx22 commented Apr 23, 2024

I tried to run this on my M1 MacBook Pro but it crashes as soon as I start a game. I tested this with Mario Kart Wii and the Wii Menu.
Crash report: https://gist.github.com/Simonx22/ca3863d29a2f569fba1bbb65da6044f4

Edit: oops I didn't see that you just pushed something I'll re-try with the latest changes shortly.

@mitaclaw
Copy link
Contributor Author

The latest commit was to avoid some baggage from sub-classing Common::CodeBlock. I see that the function I now avoid, Common::AllocateExecutableMemory, does have a special case for Apple platforms. Hopefully all is fixed?

@Simonx22
Copy link
Member

Yeah, it works now (and it's ~2 FPS faster than latest dev)

Latest dev:
image

This PR:
image

@mitaclaw
Copy link
Contributor Author

mitaclaw commented Apr 24, 2024

I simplified the Common::CodeBlock template for non-executable memory by using Common::AllocateMemoryPages rather than std::malloc. Now there doesn't need to be special cases for FreeCodeSpace, WriteProtect, and UnWriteProtect. @JosJuice informed me on the Discord about the reason for the prior breakage, saying "Using executable memory without Common::ScopedJITPageWriteAndNoExecute would definitely break things on Apple Silicon." Knowing that, I believe this change should be fine.

@mitaclaw mitaclaw force-pushed the cached-interpreter-2.0 branch 8 times, most recently from 997eb03 to 802c344 Compare May 10, 2024 04:04
It now supports variable-sized data payloads and memory range freeing. It's a little faster, too.
WritePC is now needed far less, only for instructions that end the block. Unfortunately, WritePC still needs to update `PowerPCState::npc` to support the false path of conditional branch instructions. Both drawbacks should be smoothed over by optimized cached instructions in the future.
This was a bigger performance boost than I expected.
I tried making InterpretAndCheckExceptions test only the relevant exceptions (EXCEPTION_DSI, EXCEPTION_PROGRAM, or both) using templating, but didn't find a significant performance boost in it. As I am learning, the biggest bottleneck is the number of callbacks emitted, not usually the actual contents of them.
@golivax
Copy link

golivax commented May 17, 2024

In my non-scientific tests, this PR also gives me +1.5-2.0 FPS increase on a Retroid Pocket 2S. Dolphin is getting so ridiculously optimized that more and more games are becoming playable on potato ARM hardware. Amazing work, guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants