Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger: Overhaul symbol table parsing (add .mdebug and .sndata support) and add inspector GUI for complex data types #10224

Open
wants to merge 253 commits into
base: master
Choose a base branch
from

Conversation

chaoticgd
Copy link
Contributor

@chaoticgd chaoticgd commented Nov 4, 2023

Description of Changes

I have imported symbol table parsing, symbol database and C++ AST code from CCC, entirely replacing the existing symbol map class, and implemented a GUI to inspect data structures in memory using this information.

New screenshot:

Screenshot_20240314_184024

Old screenshot:

Screenshot_20231104_003704

Some highlights:

  • CCC library rewritten to support importing symbol from multiple different symbols tables at once (e.g. some translation units will lack debugging symbols but will have linker symbols) and user-defined symbols.
    • The new symbols database uses a handle system for accessing symbols. This is useful for retaining persistent references to symbols.
    • Support for MIPS debug (.mdebug), SNDLL (.sndata) and ELF (.symtab) symbol tables. For information about these formats, see the readme linked above.
    • This takes up quite a bit more memory (on the order of 100MB) than what was there before when a full .mdebug symbol table is present. If it's too much it could be possible to only load the symbols when the debugger is opened, but I haven't implemented that so far.
    • Do comment on whether it would be better for me to put this in pcsx2/DebugTools/ccc/ like I have now with the PCSX2 license headers, or in 3rdparty/ccc/ with my own license headers. The original library is MIT so the latter might make it easier for me to copy changes back and forth? It probably doesn't matter too much, so whatever is convenient is probably fine.
  • New SymbolGuardian class that controls access to the symbol database.
    • The SymbolMap class has been entirely removed and replaced with this.
    • When an ELF file is loaded a worker thread is created to parse the symbol tables. Parsing a fully loaded .mdebug symbol table usually takes about a second (on my PC anyway).
      • One odd side effect of this is if new code is loaded in the meantime (e.g. loading from a savestate), the result of analysis may depend on how long parsing the symbol table takes. I suspect a good solution to this would be let the user scan for functions on demand, but that seems slightly out of scope for this PR. I could make ScanForFunctions use the code from the ELF instead. Please comment on what would be preferable.
      • Some built-in types are automatically added to the database so that even without debug symbols the symbol trees are somewhat useful.
  • The symbol trees in the debugger GUI can be used to inspect complex data types in memory.
    • Uses Qt model/view system.
    • Options for adding/removing/modifying symbols using dialogs and delegates.
    • Options for grouping symbols by module/ELF section/source file.
    • Function hashing system to detect symbols that are no longer valid and gray them out.
  • Added the GNU demangler to replace the Avast one that was there before.
    • This is required to support the GCC 2.x mangling scheme used in lots and lots of games.
    • Based on code from GCC 13.2.0 with the old GCC 2.x demangler (circa 2018) ported back into it.
    • Includes my own fix for a memory safety issue with the cplus_demangle_opname function.
    • A readme is included that describes how I came up with the code that I did.
  • The disassembly manager doesn't really work very well with the new symbol database. For now I've patched it up so it at least mostly works.

Progress:

  • Refactor CCC to have acceptable error handling for integration with PCSX2.
  • Import symbol table parsing and AST code from CCC.
  • Create a data inspector window in the Qt GUI.
  • Create a QAbstractItemModel to inspect data structures in memory using this type information.
  • Implement the globals tab in the data inspector window.
  • Implement the stack tab in the data inspector window.
  • Rewrite CCC so it can replace the existing SymbolMap class.
  • Implement support for other kinds of symbol tables (.symtab, SNDLL) in CCC.
  • Merge the data inspector window into the debugger window.
  • Replace the SymbolMap class with the SymbolDatabase class from CCC.
  • Various improvements (better editor widgets, VS project files, etc).
  • Rebase everything.

Possible future extensions:

  • Support for DWARF 1 symbol tables, as were written out by the Metrowerks compiler. This would allow for many more games to be supported by the data inspector. I think the current architecture I have with the C++ AST would allow for this (with some changes), but it would still probably be a long-term goal.
  • Watch window. This was originally going to be part of this PR, but I've decided to skip it for now. Ideally it would have an expression parser for a C-like language.
  • JSON import/export. This would allow people to save their user-defined symbols, and to transfer certain symbols (e.g. data types) between games.
  • Undo/redo support.
  • Parser for a subset of C++. This would allow for new data types to be created by the user. Something like this would probably do.
  • Import symbol tables from IRX modules and SNDLL files. Hook the loader functions, import/delete the symbols on the fly.

Rationale behind Changes

Many games shipped with debug symbols in this format, so this would be an extremely useful addition for the modding communities of said games.

Suggested Testing Steps

Some games with .mdebug symbols (data types, functions, globals):

  • Alex Ferguson's Player Manager 2001
  • European Tennis Pro
  • Fatal Frame
  • Go Go Golf
  • Hard Hitter Tennis
  • Jet X2O
  • MTV Music Generator 2
  • Orange Pocket: Root
  • Sega Soccer Slam
  • The Sims
  • The Weakest Link

A game with SNDLL symbols (functions, globals):

  • Ratchet & Clank: Size Matters

@chaoticgd chaoticgd marked this pull request as draft November 4, 2023 00:39
@chaoticgd chaoticgd changed the title Add .mdebug symbol table parser and data inspector window Debugger: Add .mdebug symbol table parser and data inspector window Nov 4, 2023
@refractionpcsx2
Copy link
Member

refractionpcsx2 commented Nov 8, 2023

I don't know if anybody else agrees, but is there any reason this couldn't be in the debugger? I feel like they are kind of doing similar jobs since we have function symbols in there, and it would be nice to be able to right click on a variable and do "Go to in memory view" etc, all these features just feel like they would be at home as part of the debugger, rather than a separate screen.

Great features though!

Edit: just to clarify what I was thinking, if it was have it as a tab or something, so it doesn't add extra clutter to the debugger, not all of it on one screen, that might be a bit much!

@chaoticgd
Copy link
Contributor Author

chaoticgd commented Nov 8, 2023

I don't know if anybody else agrees, but is there any reason this couldn't be in the debugger?

There isn't any particular reason, especially now memory consumption is down a lot. That said, I wanted to focus on getting the functionality working first, then worry about integrating it with the existing debugger window. If anyone has more specific design ideas, certainly feel free to suggest them.

One constraint I have is that I think the watch window should at least be accessible alongside the rest of the debugger, so people can observe how values change as they step through code, just like with a regular GUI debugger. I think having it as a tab at the bottom would make sense, maybe with the option to pop it out as its own window.

Perhaps Qt Dock Widgets or KDDockWidgets would make sense here?

@refractionpcsx2
Copy link
Member

Yeah no worries.

I believe Fobes was thinking about changing the debugger so it was separate (docked?) windows, because the debugger is a bit cramped, so it would be nice to split it out, but I want expecting you to go that far with scope lol

This will allow users to inspect the values of local variables, and
possibly in the future parameters.
@chaoticgd
Copy link
Contributor Author

Got the stack tab working, with some caveats.

Screenshot_20231109_091444

The biggest issue is that for some functions the live ranges seem to be invalid.

It should be possible to display the values of parameters as well as locals, although the registers stored in the symbol table seem to be the registers used in the body of the function, not the prologue, so maybe there's something to be done there to make that more user friendly if you just break on the first instruction of a function.

@chaoticgd
Copy link
Contributor Author

Some further thoughts about integrating this with the debugger window:

One challenge I see is how we're going to approach the multiple sources of truth that are the two different symbol tables. I see two different approaches to deal with this:

  • Implement different tree models for the different symbol tables.
  • Implement support for the standard ELF symbol table in CCC, and have it process both symbol tables into the same data structure in memory.

The latter would probably be cleaner, although also more invasive.

@refractionpcsx2
Copy link
Member

I agree the latter sounds like a better option, try to keep the symbols unique, we don't really want multiple sources of truth if we can help it, but this is going to be true if it's integrated with the debugger or not.

@chaoticgd
Copy link
Contributor Author

Little update: I'm still working on this. Building the new data structures and porting my existing code to use them is taking some time. I'll copy it all into this PR when it's all relatively stable.

@github-actions github-actions bot added the Dependencies Pull requests that update a dependency file label Jan 7, 2024
This introduces a new "symbol database" which will allow for the
SymbolMap class to replaced entirely, so we can load all the symbols
from multiple different symbol tables into a single unified data
structure.

There are tons of other improvements too, including lots of fixes
for STABS parsing edge cases.

For now, the data inspector has been modified to use the new version
of the code.

New commit hash:
f79a17c1edf8e53fc226f309407ccccb2c1caf91
3rdparty/rcheevos/rcheevos Outdated Show resolved Hide resolved
@F0bes
Copy link
Member

F0bes commented Apr 17, 2024

I've been discussing how to get this merged with Dan and f0bes. Dan brought up how the updater shows the commit history for its changelog, so how merging over 200 commits might be problematic. I hence think it would be best to just do a squash merge for this one, instead of trying to rebase it.

I noticed you seem to be gearing up to do a stable release. Given the obscene size of this PR, I assume you're not going to want to merge this until that's out.

I don't have merge rights here, but I don't believe we can do our regular rebase commit here, it didn't let me do so on my fork. (I assume due to the merge commits):
image

A merge commit I'm pretty sure would break the updater. I tested it on a local branch and it ends up doing stuff like the following (this is bad for us because pr commits are assumed to be bunched together):
image

Finally, a squash merge works fine locally:
image

I think the best choice here is to do a squash commit when the maintainer merges this into master

@chaoticgd
Copy link
Contributor Author

chaoticgd commented May 2, 2024

Noticed an annoying regression with the disassembler. I'll look into it.

@chaoticgd
Copy link
Contributor Author

chaoticgd commented May 3, 2024

Noticed an annoying regression with the disassembler. I'll look into it.

False alarm. Or rather, the wrong alarm sounded. What I noticed was that symbols sometimes weren't being substituted into the disassembly properly. I turns out this was caused by a typo in this commit. Because instructions disassembled outside of a function won't touch this code path it tricked me into thinking it was a problem with my PR.

The flurry of commits is because while investigating this issue I noticed a bunch of other problems, which actually were bugs with this PR. These are all fixed, with the exception of handling some cases where function symbols overlap (e.g. because of overlays which all get dumped into a single symbol table), although it would probably be okay to wait for a future PR to see about handling that better as DWARF support may help with that

I also made some other miscellaneous improvements while I was at it.

EDIT: Actually there's one issue I forgot about. Fixed.

New CCC commit hash:
b0959281abb76f69aeba99c4c5bb3da95894e9d8
@stenzek
Copy link
Member

stenzek commented May 4, 2024

Is it possible to squash the commits down into a smaller number? For example, one to add the new demangler, another to swap over to it, etc.

It's quite difficult to review 252 commits over hundreds of files, and it's also not rebase-mergeable (which is our preferred technique).

@chaoticgd
Copy link
Contributor Author

chaoticgd commented May 4, 2024

Is it possible to squash the commits down into a smaller number? For example, one to add the new demangler, another to swap over to it, etc.

It's quite difficult to review 252 commits over hundreds of files, and it's also not rebase-mergeable (which is our preferred technique).

I'll look into that, although I'll have to learn more about rebasing first. I didn't previously spend too much time on that since the other contributors said it would be squashed anyway.

@stenzek
Copy link
Member

stenzek commented May 4, 2024

For smaller PRs, yeah, we can just squash them on merge. But with 50k lines added, I'd prefer to split it up, to make bisecting issues easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants