Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DolphinQt: JIT Widget Refresh #12714

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

mitaclaw
Copy link
Contributor

Dolphin's current (soon to be former) JIT widget, even back when it was the JIT window in DolphinWx, has never really done much. The only thing it is equipped to do is respond to the CodeWidget::RequestPPCComparison(u32 addr) signal emitted by the "PPC vs Host" action in the Code View widget's context menu. Furthermore, this single thing it does is misleading thanks to a series of questionable choices. In short, the JIT widget (attempts) to show a disassembly of recompiled code and the PowerPC code it came from.

How and where does it get all of this from just a single effective address? That's where it gets ugly.

First, the current JIT widget does the host code disassembly by calling DisassembleBlock, which in turn uses JitInterface::GetHostCode. JitInterface::GetHostCode searches to find a JitBlock with a starting address matching the provided effective address... or any JitBlock with a starting address up to -2000 away. It also needs to know the PowerPC feature flags of the block for this lookup, so it is just uses whatever is current†. If any block is found, JitInterface::GetHostCode will next check if the effective address lies within the bounds of block->effectiveAddress to block->effectiveAddress + block->originalSize††. JitBlock::originalSize is the size of the original PPCAnalyst::CodeBuffer that was recompiled for the JitBlock, and these code buffers do not necessarily represent a contiguous sequence of instructions! What's more more, thanks to branch following, the same effective address can be recompiled multiple times by far-off JitBlocks with starting addresses this function is designed to disregard. I repeat, multiple times; this function does not account for the fact that there could even be more than one answer.

† While this would be unacceptable in my refreshed JIT widget, I admit it was probably good enough within the context of the current JIT widget which only receives an address from the Code View widget, which is also bound by the current feature flags.
†† Actually, the upper bound is off-by-one due to it using >= rather than >.

Putting all of that aside, the JitInterface::GetHostCode will, if a JitBlock is found, return a JitInterface::GetHostCodeResult containing the starting effective address of the JitBlock, a pointer within the near code cache, and the size of the recompiled near code†, otherwise it will return some flavor of JitInterface::GetHostCodeError. DisassembleBlock will then std::visit the variant to either disassemble†† the near code, or get an error message. All of this is then returned via a second result struct, DisassembleResult, and finally we are back in the current JIT widget's update function.

† Information about the far code cache is completely left out of this exchange; the current JIT widget was designed before that existed.
†† In the case of the CachedInterpreter, the disassembly is done despite the data not containing host instructions. The result is nonsense.

After putting the text from the DisassembleResult into a QTextBrowser†† for the host code disassembly, the horror continues. Next, a whole slew of PPCAnalyst classes are spun up with a bunch of hard-coded configurations† to perform an analysis on the PPC address from the DisassembleResult. This is to create a CodeBuffer like what was originally used to recompile the near code. With this new CodeBuffer in hand, it is displayed in the other QTextBrowser†† for the PPC code disassembly. Finally, some extra stats about instruction counts and blowup are calculated to be stuffed into the end of the PPC code box "since it's generally the shortest". One of these statistics is an "estimated cycle count", which is not accurate for any block with early exits.

† While there may be other incongruencies, the one I am aware of is that these hard-coded configuration include OPTION_CONDITIONAL_CONTINUE and OPTION_BRANCH_FOLLOW, which the CachedInterpreter does not use.
†† I just want to vent a little. Why did this widget use QTextBrowsers for this instead of QPlainTextEdits? Why was it so important to make the error messages be italic with HTML tags?

old jit widget

So that's everything the current JIT widget is able to do, and honestly it's not very good at it. Technically, the JIT widget also has a refresh button that can run the update function again using the last effective address, but I'm not counting that. So you might be wondering what is the point of the table that takes up half of the widget, and the answer is: it does absolutely nothing! In JITWidget.cpp, this comment says it all:

// TODO: Actually do something with the table (Wx doesn't)

So with introductions out of the way and motivations clear, I introduce my JIT Widget Refresh. Here are some screenshots of it in action:

jit widget simple
jit widget complex
jit widget recompiles address filter
jit widget other filters
jit widget cached interpreter

Yes, that is fourteen columns. Don't worry, they are can be configured to hide and move around just like in my last major project, the Branch Watch dialog. It's almost difficult to explain all of the features this refreshed JIT widget has, but I will try anyway. First of all, the table now shows JitBlocks contained by the JitCache. This information is only available when emulation is paused. Also shown when emulation is paused is free memory and memory fragmentation info from the JIT on the status bar. Selecting any row will show a disassembly of that JitBlock's original CodeBuffer†, near host code, far host code, and instruction blowup stats on the status bar. The CachedInterpreter is even fully supported, featuring a disassembly of the simple bytecode it uses in the near code box. There are two actions in the context menu for the table. "View Code" shows the selected JitBlock's†† effective address in the Code widget. "Erase Block(s)" will erase individual JitBlocks from the JitCache, forcing those JitBlocks to recompile again without a full JitCache clear. Along the top of the widget are various controls. First, there are filters for min / max effective address, symbol name (derived from effective address), plus a filter for "Recompiles Physical Address" that allows you to see only JitBlocks that recompile a given physical address. Whenever the "PPC vs Host" action is signaled from the Code View widget, the address is translated and put into this filter, solving the multiple recompilations problem. This widget also has buttons mirroring actions in the JIT menu of the MenuBar, including "Clear Cache", a new "Wipe Profiling" action, and a button to start and stop software JIT profiling. Pro tip: you can click on the status bar to show memory stats if the instruction count stats are what's currently being shown.

† This is now done by storing a stripped-down copy of the original code buffer in the JitBlock, and this is only copied into the JitBlock when the debug interface is enabled.
†† When multiple rows are selected, the row that is right clicked (look for the dotted border on the cell) is the JitBlock which the effective address is taken from.

Unfortunately, I was a bit opportunistic with the following change; I wasn't satisfied with just fixing up one crusty relic at a time. There is a compile-time feature in the x86_64 JIT called LogGeneratedX86. Because the changes I made for the JIT widget made it so easy to do, I've updated this old feature to work for all three JITs. I know, not good for review purposes, but it just made sense and was hard to resist. Did you know the current LogGeneratedX86 doesn't even compile in the current master branch?

Alright, lightning round for the remaining changes. UICommon's Disassembler has been totally rewritten to not suck so bad and is now part of Common. Android also has access to JIT profile data wiping in the same debug menu I put the other JIT profiling stuff in. The Qt menu bar as well (this was implied earlier). I added a new Common::Unreachable function to mirror C++23's std::unreachable, because I wanted this widget to be as optimized as possible. 🦀 JitBlockData::codeSize is gone 🦀. The physical address std::set is now moved into the JitBlock instead of copied, it should be a lot faster that way. Oh, and I tried something new with the Qt class layouts this time, and I know it sort of violates a coding style guideline by having multiple public/private sections, but I found that by ordering it in this way, it made function dependencies flow chronologically in the header file and the cpp file really nicely. Maybe consider it? Alright, thank you for reading. I'm open to suggestions for how to split this into multiple commits, btw.

@mitaclaw mitaclaw force-pushed the jit-widget-refresh branch 4 times, most recently from d2e8ef4 to a9c3c6c Compare April 17, 2024 21:30
@mitaclaw
Copy link
Contributor Author

AArch64 platforms without LLVM's disassembler available (e.g. Windows on ARM) have no alternative HostDisassembler to use. Might be worth looking into this in the future.

@mitaclaw mitaclaw force-pushed the jit-widget-refresh branch 2 times, most recently from e5d7213 to fd5e22c Compare April 19, 2024 08:08
@mitaclaw mitaclaw marked this pull request as draft May 8, 2024 03:16
@mitaclaw mitaclaw force-pushed the jit-widget-refresh branch 4 times, most recently from 2da5f42 to 7f3b68a Compare May 10, 2024 04:46
It now supports variable-sized data payloads and memory range freeing. It's a little faster, too.
WritePC is now needed far less, only for instructions that end the block. Unfortunately, WritePC still needs to update `PowerPCState::npc` to support the false path of conditional branch instructions. Both drawbacks should be smoothed over by optimized cached instructions in the future.
This was a bigger performance boost than I expected.
I tried making InterpretAndCheckExceptions test only the relevant exceptions (EXCEPTION_DSI, EXCEPTION_PROGRAM, or both) using templating, but didn't find a significant performance boost in it. As I am learning, the biggest bottleneck is the number of callbacks emitted, not usually the actual contents of them.
Accessible from DolphinQt and Android.
I added a new `get_stats` member function to the upstream for use in the JIT Widget Refresh.
Fulfilling a certain six-year-old todo, and revamping LogGeneratedX86 as a bonus.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant