Large-scale PowerPC recompiler rework #641

Exzap · 2023-01-30T05:16:29Z

Disclaimer: This is work-in-progress. I'm opening this draft PR for visibility, so others can track progress and know not to alter recompiler code. Work started on this in November and the ETA for completion is somewhere in the span of the next few months, depending on my motivation.

Goals

I originally started work on the recompiler in 2014 and since then I have learned a lot more about state-of-the-art compiler and IR design. While I'm generally happy with the quality of our code translation, some of the design choices I made along the way make it hard to introduce further optimizations or fixes. A lot of the complexity is at the burden of the x86-64 backend, which means that all of that would have to be reimplemented when targeting another architecture.

Overall, the idea is to make both the front-end (PPC to IR) and the back-end (IR to x86-64) as "dumb" as possible so that all the complex logic can be shifted to operate on platform-independent IR, lowering the burden on platform-specific code.

State

Please do not report bugs yet. In fact I don't recommend trying this out, it's an active construction site.

I know a lot of these are pretty abstract, so in the future I might add a few before-vs-after code examples to this text.

Q&A

Will this PR add ARM support?

No. But it will make adding a new target architecture a lot easier and if I am motivated enough I'll look into adding an aarch64 backend after this is done.

Will this make Cemu faster?

Maybe? After everything is done the recompiler should output faster code, but CPU execution speed generally isn't a bottleneck in Cemu so it's hard to predict whether there will be an actual difference.

What about the proposed plan to use LLVM?

I did quite a bit of research on that. The biggest downside is that LLVM is still quite JIT-unfriendly and comes with significant bloat. Not saying that it wouldn't work, but the cons outweigh the pros in my opinion. Plus we already got a pretty sophisticated recompiler and it would be a waste to throw it away.
On a personal note, I enjoy working on custom solutions more than plugging in libraries so it's easier for me to stay motivated and make progress. In regards to total effort both solutions are about the same.

Wunkolo · 2023-01-30T05:43:54Z

What would be the scope of changing the x64 emitter over to something like xbyak?

With the current x64 emitter, adding a new instruction or class of instructions would involve implementing the encoding for those instructions (REX, VEX, EVEX, ModR/M, SIB, etc) from scratch and then implementing the new instruction in particular AND detecting it the particular CPUID flags when this redundant work can probably just be pushed onto a proven library.

Exzap · 2023-01-30T06:28:26Z

Thanks for pointing out Xbyak, I wasn't aware of it. The assemblers I looked at were always a bit overkill for our purposes, usually focusing on human-friendly API and less towards a simple interface for machine generated code. We only need a very thin emitter, but Xbyak seems to be exactly that.

As part of this rework I also started a new "cleaner" x86-64 high-performance emitter which I auto-generate from encoding tables. The effort for this is relatively minimal, but using a premade emitter would certainly cut down the effort even further. I'll think about it.

amayra · 2023-05-16T22:51:13Z

did you drop this project ?

Exzap · 2023-05-17T10:33:47Z

Nah just busy with other stuff. I'll get back to this eventually

iMonZ · 2023-09-26T19:48:47Z

Nah just busy with other stuff. I'll get back to this eventually

Thanks! ARM64 Support would make the CEMU emulator finally done and future proof!

Wunkolo · 2023-09-26T20:40:03Z

On ARM64: I've been using oaknut on other projects. It is structured very similarly to xbyak.

Gabezin64 · 2023-10-13T00:20:31Z

This will finally fix the lens flare issue in The Wind Waker HD and Twilight Princess HD?

Exzap · 2023-10-13T13:34:18Z

This will finally fix the lens flare issue in The Wind Waker HD and Twilight Princess HD?

That's a graphical issue. It's unaffected by this CPU rework.

Intermediate commit while I'm still fixing things but I didn't want to pile on too many changes in a single commit. New: Reworked PPC->IML converter to first create a graph of basic blocks and then turn those into IML segment(s). This was mainly done to decouple IML design from having PPC specific knowledge like branch target addresses. The previous design also didn't allow to preserve cycle counting properly in all cases since it was based on IML instruction counting. The new solution supports functions with non-continuous body. A pretty common example for this is when functions end with a trailing B instruction to some other place. Current limitations: - BL inlining not implemented - MFTB not implemented - BCCTR and BCLR are only partially implemented

Instead of having fixed macros for BCCTR/BCCTRL/BCLR/BCLRL we now have only one single macro instruction that takes the jump destination as a register parameter. This also allows us to reuse an already loaded LR register (by something like MTLR) instead of loading it again from memory. As a necessary requirement for this: The register allocator now has support for read operations in suffix instructions

Also removed associatedPPCAddress field from IMLInstruction as it's no longer used

Storing the condition result in a register instead of imitating PPC CR lets us simplify the backend a lot. Only implemented as PoC for BDZ/BDNZ so far.

Carry bit is now resident in a register-allocated GPR instead of being backed directly into IML instructions All the PowerPC carry ADD* and SUB* instructions as well as SRAW/SRAWI have been reworked to use more generalized IML instructions for handling carry IML instructions now support two named output registers instead of only one (easily extendable to arbitrary count)

It's better to do it in a lowering pass so that the backend code can be kept as simple as possible

CR bits are now resident in registers instead of being baked into the instruction definitions. Same for XER SO, and LWARX reservation EA and value. Reworked LWARX/STWCX, CRxx ops, compare and branch instructions. As well as RC bit handling. Not all CR-related instructions are reimplemented yet. Introduced atomic_cmp_store operation to allow implementing STWCX in architecture agnostic IML Removed legacy CR-based compare and jump operations

Also implement PPC NAND instruction

Whoopsie

Additionally there is no more range limit for virtual RegIDs, making the entire uint16 space available in theory

Exzap force-pushed the jit-work branch from cdfcd96 to 3590ad9 Compare March 13, 2023 04:10

jcrm1 mentioned this pull request Sep 26, 2023

CI: Add macOS build #274

Merged

Exzap added 20 commits December 17, 2023 16:38

Latte: Fix race condition on close during game boot

da9dd77

PPCRec: Use vector for segment list + deduplicate RA file

921a4fd

PPCRec: Use vector for instruction list

1f605b0

PPCRec: Move Segment and Instruction struct into separate files

95f3f07

PPCRec: Rename IML structs for better clarity

8ae2c96

PPCRec: Move debug printing + smaller clean up

0b76995

PPCRec: Move analyzer file + move some funcs to IMLInstruction

867053e

PPCRec: Move IML optimizer file

8c821ce

PPCRec: Move IML register allocator

6bf234d

PPCRec: Emit x86 movd for non-AVX + more restructuring

407e3fa

PPCRec: Move X64 files into subdirectory and rename

a3997f8

PPCRec: Fix merge conflicts

ea2a8cf

PPCRec: Fix single segment loop not being detected

c22954f

Also removed associatedPPCAddress field from IMLInstruction as it's no longer used

PPCRec: Remove now unused PPC_ENTER and jumpMarkAddress

7e133a4

PPCRec: Clean up unused flags

c65f9a4

PPCRec: Make LSWI/STWSI more generic + GPR temporaries storage

7c405e6

PPCRec: Make register pool for RA configurable

0c851bd

PPCRec: Rename register constants to avoid name collision

29440f5

Exzap added 21 commits December 17, 2023 16:38

PPCRec: New x86-64 code emitter

f688faf

PPCRec: New compare and cond jump instrs, update RA

8cf9787

Storing the condition result in a register instead of imitating PPC CR lets us simplify the backend a lot. Only implemented as PoC for BDZ/BDNZ so far.

PPCRec: Streamline instructions + unify code for CR updates

943174e

PPCRec: Further unify CR code

8ba047f

PPCRec: Avoid complex optimizations in backend

ff5eee2

It's better to do it in a lowering pass so that the backend code can be kept as simple as possible

PPCRec: Refactoring and clean up

b88ac02

PPCRec: Refactor load/store instructions

c7b5c22

PPCRec: Use IMLReg in more places, unify and simplify var names

61cd720

PPCRec: Simplify PPC and IML logic instructions

2fa6899

Also implement PPC NAND instruction

PPCRec: Unify code + misc RA preparation

5958786

Whoopsie

PPCRec: Use IMLReg type in FPR RA

0278149

PPCRec: Use agnostic breakpoints

156e31e

PPCRec: Fix capitalization in include

011b95d

PPCRec: Initial support for typed registers

43b2d33

PPCRec: Partial support for typed registers in RA

c1e6120

PPCRec: Further work on support for typed registers in RA

45e7a38

Additionally there is no more range limit for virtual RegIDs, making the entire uint16 space available in theory

PPCRec: FPRs now use the shared register allocator

81f7418

PPCRec: Implement MFCR and MTCRF

800ce43

Fix compile errors due to rebase

7b04057

Exzap force-pushed the jit-work branch from 3590ad9 to a671611 Compare January 13, 2024 15:31

PPCRec: Dead code elimination + reintroduce pre-rework optimizations

570e2f6

Exzap force-pushed the jit-work branch from a671611 to 570e2f6 Compare January 13, 2024 16:15

Exzap mentioned this pull request Feb 18, 2024

Deleting some spaces #1088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large-scale PowerPC recompiler rework #641

Large-scale PowerPC recompiler rework #641

Exzap commented Jan 30, 2023 •

edited

Wunkolo commented Jan 30, 2023 •

edited

Exzap commented Jan 30, 2023 •

edited

amayra commented May 16, 2023

Exzap commented May 17, 2023

iMonZ commented Sep 26, 2023

Wunkolo commented Sep 26, 2023

Gabezin64 commented Oct 13, 2023

Exzap commented Oct 13, 2023

Large-scale PowerPC recompiler rework #641

Are you sure you want to change the base?

Large-scale PowerPC recompiler rework #641

Conversation

Exzap commented Jan 30, 2023 • edited

Goals

State

Q&A

Will this PR add ARM support?

Will this make Cemu faster?

What about the proposed plan to use LLVM?

Wunkolo commented Jan 30, 2023 • edited

Exzap commented Jan 30, 2023 • edited

amayra commented May 16, 2023

Exzap commented May 17, 2023

iMonZ commented Sep 26, 2023

Wunkolo commented Sep 26, 2023

Gabezin64 commented Oct 13, 2023

Exzap commented Oct 13, 2023

Exzap commented Jan 30, 2023 •

edited

Wunkolo commented Jan 30, 2023 •

edited

Exzap commented Jan 30, 2023 •

edited