Replies: 7 comments 3 replies
-
From what you wrote above, I get the impression that you might not be aware of the complexity of how classes can get laid out in memory. A parent class is not always represented as a complete unit within a child class, especially when there is multiple inheritance involved. Thus, you cannot always just put the parent inside a child at the offset you believe that it belongs, and any virtually inherited members will likely be disjoint from the rest of the parent. As you may be aware, MSDIA-based processing using pdb.exe is our legacy capability for applying PDBs to programs. While the newer PDB Universal is nearly on par in terms of capabilities, it is lacking some in the arguments and local variables arenas. Moreover, I've been the biggest proponent for keeping the legacy capability around, as it has helped in both understanding how things should work and as a back-up capability for situations where the newer capability has failed (e.g., an odd case in #3993). That said, is there a reason why you'd like to put more focus on the legacy capability? Is it because you found the commented-out code? Or is it because you'd like to employ the pdb.exe XML output elsewhere? I've put a significant amount of study into the object-oriented, class-based components of the PDB while developing the PDB Universal capability. A lot of code has been written against this understanding, though it is not turned on for a number of reasons. The main reason was that in the rush to get PDB Universal into the 9.2 delivery and turned on as the "default," instead of the legacy MSDIA-based capability, the decision was made to not perturb the processing to much from what the initial capability offered. We are now well beyond the 9.2 delivery... Other reasons include the fact that we do not have a Ghidra design in place yet to hold object information, the fact that we had explored several ways to represent the class structures using our current "C" structure model, and that I wanted to rework how I approached laying out the classes as "C" structures (as the interim to what could be true class support). I could delve into this more if you are interested. If you are able to work with Ghidra source code, I can tell you how to investigate functionality of this currently-turned-off code. (We need some internal discussions regarding what should be provided in a supported capability.) If we were to support contributions to the legacy capability, it would probably be lower priority for us than changes to PDB Universal, though, as long as we are supporting both, we might want their capabilities to continue to be relatively on par with each other. Note that changes only to pdb.exe is probably a non-starter, as we would have to, at a minimum, need to ensure that there is no breakage in the java code that interprets the XML output, and a complete capability would probably create a lot more work to make it match the work that has already been investigated in the PDB Universal capability. |
Beta Was this translation helpful? Give feedback.
-
Hm, I found the output from the universal parser to be a bit inferior the MSDIA based one, and wasn't aware it was considered legacy, although at this point I'd need to re-evaluate the universal to say exactly what was inferior. I've only started looking pdb.exe recently as I started building out an automatic analysis pipeline on a linux machine to do analysis of a large swath of older versions of the program I'm looking at. I've been looking for an automated solution to the missing/undefined data in type structures for a while; and while messing with the Microsoft MSDIA example saw that it correctly output the info I was looking for, and then also realized the pdb.xml output had it identified as unknown type at the correct offsets with the correct length, which is when I noticed the commented out code. I would say, as a user, it's non-obvious that PDB Universal is intended as the "future" and PDB MSDIA is intended as "legacy".
Yeap, that should be no problem, although Java is not my preferred language :D Thanks for the all the info. |
Beta Was this translation helpful? Give feedback.
-
I'm sorry that it wasn't clear as to which capability was legacy and which was new.
In PdbApplicatorOptions.java, the following line needs to be change from false to true. Build and launch again to get additional options. If you can fork the repo and make the change, do the build, and run, then you are good. If that is too much, there maybe ways to patch an individual class... details in the Ghidra/patch/README.txt. BTW, I'm curious as to which version you Ghidra you are using. We are considering making a system property in a near future release that would allow you to change this behavior without a recompile. |
Beta Was this translation helpful? Give feedback.
-
In master for 10.2 (b7cea82), I've incorporated a system property (see I played with the class-related options the other day a little to refresh my memory. Until we have complete class support, I could see another layout model that would be completely flat (not implemented), which could show members in an extremely flat C model. As it is, I've created and nested chunks of the parent classes inside the child, which helps with setting all members at the appropriate alignment within the child. In the extremely flat model, I'd still need to come up with more appropriate (unique) member names for encapsulated VBTable/VFTable pointers. Reminder: this is research/investigatory developer code, but if you are able to try to make use of it, it could help with the direction in which we continue pursuing it. Also, if you are in a position to build MSFT VS code, you can see how they lay out classes using a "hidden" option (I use it on the In some of my code that I used to tease out how to create these layouts, I had the following structure (class), but I'm not giving you all the contents of its parents here. This class only has virtual members and inheritence. I did not include any virtual functions in it, so you are seeing lots of virtual base table pointers, but no virtual function table pointers.
The layout provided by the "hidden" option follows:
It will also give additional information about the table contents. |
Beta Was this translation helpful? Give feedback.
-
OK, That's all good info to know, I didn't even realize virtual base tables were a thing, I'll need to spend some time messing around with that option and getting to better understand all of this in relation to the code I'm looking at. Something has come up in one of my other projects so I needed to focus over there, but I'm hoping to get back to this very soon, also, thanks for adding the option, I'll get a build env for master setup so I can try it out. Also, it sounds like you guys have some fantastic domain specific knowledge from all the work you do, it'd be awesome if a lot of that knowledge (like this stuff is great info for someone trying to understand) got pushed into a wiki or some kind of documentation aggregation beyond the docs within the source code; something more general in scope. I would also love to know what the anticipated future for Ghidra analysis of fully C++ OO code with classes and templates and all that kind of stuff is, if that is likely to get integrated more into the data structures (it sounds like it is, from what you've said) or if Ghidra is expecting to stay more C-style focused. |
Beta Was this translation helpful? Give feedback.
-
Will keep this ticket open for awhile, giving you time to get back to investigating this prototype work. Maybe 10.2 will be out by the time you have time to do this. I don't know if we can do the other things you ask, but I will keep it in mind. At least people should be able to find the discussion we are having here. We've certainly been interested in having more full C++/OO analysis, and we certainly have made progress under the hood in various areas that will help support this work, but I cannot supply a road map or timeline. I'm trying to get back to namespace/template work, as it is an important part of this, though, if I can further the prototype class work that is in the PDB, I'd want to work that in as well. |
Beta Was this translation helpful? Give feedback.
-
Stumbled across this while attempting to create a VS2010 MFC type library for ghidra 11; I'm building a DLL in VS2010 with a pre-compiled header with every type header in it and importing that into Ghidra. I noticed that Ghidra refused to populate the inherited members of derived classes, no matter which PDB importer I used. Am I out of luck until further support is landed, or is there a way to fix this? |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
Doing analysis of a Microsoft Visual Studio generated PDB using the ghidra pdb.exe XML export, the XML shows the following:
So it is almost identifying the base classes, with the correct offset and length, but obviously not the correct kind. I was contemplating writing a script to backfill all this data for my own purposes, but I figured I'd ask here if there is a reason this data is not inserted into the type struct object. If there is not a reason, I'd be happy to take a whack at extending pdb.exe (I see there is already some code there that is commented out... which is also why I'm here) and the relevant portions of the java code base to get the base class info to load into the data type struct in the correct offset, otherwise I'll just extract all my relevant data from the XML export and manually backfill it for myself.
Currently the output data type has 0x38 bytes of "undefined" starting at 0x0.
Describe the solution you'd like
The base classes that are undefined space in the data type struct to be configured as the appropriate data types of the base class.
Describe alternatives you've considered
As above, I can use scripting to fill the data in for myself, just curious if there is a reason the functionality doesn't exist already, since it seems like some work was done on it on the pdb xml side.
Additional context
https://github.com/NationalSecurityAgency/ghidra/blob/master/Ghidra/Features/PDB/src/pdb/cpp/symbol.cpp#L300
Beta Was this translation helpful? Give feedback.
All reactions