Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/write support PDB files #59

Open
Washi1337 opened this issue May 27, 2020 · 6 comments
Open

Read/write support PDB files #59

Washi1337 opened this issue May 27, 2020 · 6 comments

Comments

@Washi1337
Copy link
Owner

No description provided.

@Washi1337
Copy link
Owner Author

Related #91

@Washi1337 Washi1337 changed the title Add PDB support Read/write support PDB files Aug 26, 2020
@Washi1337
Copy link
Owner Author

Washi1337 commented Jun 8, 2021

Formats

There seems to be multiple PDB formats in use.

  • PDB v7
  • PDB v2
  • Portable CILDB
  • Portable PDB

PDB 7

Uses MSF format and seems to be used the most by compilers such as VC++ and CLang/LLVM. This will require a lot of new models to be added to AsmResolver and probably is worth a separate package (perhaps called AsmResolver.Pdb)

Official spec seems to be very lacking. Microsoft only "published" parts of the definitions. LLVM has some docs on it as well which seem more useful than Microsoft's. Wikipedia's docs are also very sparse (https://en.wikipedia.org/wiki/Program_database).

We might be able to look for reference implementations such as pdbparse.

PDB 2

Very little information seems to be available about this format. [Wikipedia] (https://en.wikipedia.org/wiki/Program_database) mentions the existence of it and some details but not a lot. It seems to be that it resembles some of PDB v7 though, but will need samples for this.

pdbparse implements this format as well.

Portable PDB format

This seems to be emitted by the Roslyn compilers from the new .NET SDK, and closely resembles the .NET's metadata directory, but with the extension of a #Pdb stream as well as some extra tables in the #~ stream. Official spec here. Lots of the existing .NET metadata models in AsmResolver can probably be reused.

Portable CILDB

Partition V of ECMA-335 specifies another format called CILDB. This documentation is good but I am not sure which compilers emit these types of files as I have not seen any sample with a PDB file like this.

Design choices

Given the complexity of these formats, it might be best to introduce new packages that can handle these types of files. A couple big important design choices need to be made fairly quick however. These are mainly related to where the implementations of these formats live.

For portable PDBs, given that it is really just an extension to the already existing .NET metadata file format, it probably would make sense if we added the raw metadata table row structs to the AsmResolver.PE.DotNet.Metadata namespace (located in the AsmResolver.PE package) to stay consistent with the rest of the metadata table models. However, for higher level interpretation of these tables (e.g. interpretation of blobs and name indices), it would make more sense if it is put either in AsmResolver.DotNet, or in a separate package.

We could introduce a separate package called AsmResolver.Pdb. This makes sense given the complexity of PDB v2 and v7 (it implements a file system). However, if we introduce such a package, its name may be confusing as it would assume it supports any of the PDB formats, including the Portable PDB file format. If we include support for Portable PDB in this new package, that might result in a dependency to AsmResolver.PE or even AsmResolver.DotNet. Especially the last one is not desirable, since it would mean that users that are only interested in reading native PEs with symbols to also reference AsmResolver.DotNet which adds another 300kb worth of code that they will never use.

Another idea is to introduce multiple PDB related packages instead. It could perhaps look like:

  • AsmResolver.Pdb: for a common PDB file format abstractions
  • AsmResolver.Pdb.Pdb2: For PDB2
  • AsmResolver.Pdb.Pdb7: For PDB7
  • AsmResolver.Pdb.PortablePdb: For portable PDBs.
  • AsmResolver.Pdb.CilDb: For CILDB

There might also be the possibility to merge the PDB2 and PDB7 versions into one single package as these formats seem to resemble each other somewhat.

The great benefit of this approach is that it follows more the modular design style of AsmResolver in general, as AsmResolver.PortablePdb and AsmResolver.CilDb will be able to depend on AsmResolver.DotNet without the others also needing to do that. The obvious downside is that the number of new packages increases a lot, and users of AsmResolver might not like that.

@Washi1337 Washi1337 added this to the 4.7.0 milestone Jun 15, 2021
@Washi1337 Washi1337 modified the milestones: 4.7.0, 4.8.0 Aug 7, 2021
@Washi1337 Washi1337 removed this from the 4.8.0 milestone Nov 28, 2021
@zziger
Copy link

zziger commented Dec 4, 2021

Any news regarding implementation of that?

@Washi1337
Copy link
Owner Author

Unfortunately, no concrete implementations yet. As it is right now, other features and bug reports have gained precedence over completely new features such as PDB file support. If there is a demand for PDB support however, I may bring this feature up the backlog.

@ds5678
Copy link
Contributor

ds5678 commented Mar 28, 2022

I would be interested in helping with this. My interest lies mostly in the PDB 7 format and a little in the Portable PDB format. Even though it increases the package count, I am in favor of your suggestion to do 4 or 5 packages for this. It seems cleaner and most software publishes as a single file.

@Washi1337
Copy link
Owner Author

Washi1337 commented Mar 30, 2022

@ds5678, Thanks for taking interest. Next to the package design, a couple additional big questions still need to be answered as well, which will also probably answer indirectly which new packages we will finally end up with. Some thoughts below:

This feature will probably require quite a bit of prep-work before actual actual coding and integration can take place. One big aspect we need to figure out is whether it is possible to find some kind of unifying API design that abstracts at least some parts of every format into a higher dimension. At least for read-support this would be preferable, as this would simplify usage of the packages a lot. However, too much simplification can also lead to certain features of some formats be forgotten / hidden, which we need to be careful about. One thing I can predict already is that writers are most likely going to have to define their own contracts for their respective formats within their respective packages. I don't think we can (or want to) find a unifying API for this given the vast amount of differences between these formats.

Pdb7 and PortablePdb are the ones used the most nowadays it seems, this is definitely where we should put focus on first. Another question is how (if at all) these packages should be integrated somehow in AsmResolver.DotNet, especially given the fact that .NET uses both Pdb7 (legacy .NET framework) and PortablePdb (.NET Core / .NET). For example, the names of local variables of method bodies are stored in these symbols. Other libraries (such as dnlib and Cecil) provide a Name property for their representative class of local variables that pulls data from the pdb. Do we want something similar as well, or will this inevitably lead to tighter coupling of the packages, something I think we really should try to avoid.

@Washi1337 Washi1337 mentioned this issue Apr 14, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants