Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsmResolver.Symbols #297

Open
1 of 5 tasks
Washi1337 opened this issue Apr 14, 2022 · 0 comments
Open
1 of 5 tasks

AsmResolver.Symbols #297

Washi1337 opened this issue Apr 14, 2022 · 0 comments

Comments

@Washi1337
Copy link
Owner

Washi1337 commented Apr 14, 2022

AsmResolver.Symbols

Status

Overview

Many compilers generate next to executable files also symbols. These contain additional metadata about the program that is not strictly required for its execution, but help immensely in debugging the target. Information includes for instance mappings from binary code to source code lines, as well as (local) variable names and their types.

While AsmResolver can currently locate PDB files (through CodeView data debug directories in the PE), it is currently impossible to actually interpret PDB files and similar. This proposal is about adding read/write support for files containing these debug symbols, with the idea to modify the general package dependency graph to the following:

AsmResolver Symbols

New packages are highlighted in green, while modified packages are highlighted in yellow. More packages may be introduced in the future for other types of formats (like MDB, CilDB or DWARF), and hence the package is called AsmResolver.Symbols and not AsmResolver.Pdb. Future packages should follow a similar pattern where they all derive from AsmResolver.Symbols, and have potentially other dependencies to existing packages for their domain. What is important is that the symbol packages can be excluded from the normal executable parsing libraries, such that consumers that don't need to process symbols will not need to include code that they are not going to use.

AsmResolver.Symbols

This package serves as a base package for all packages that are related to dealing with debug symbols, and is meant to unify read access to symbol information from various different file formats. This is done through the means of defining interfaces that are implemented by any of the derivatives of this package.

The exact interfaces are not fully known yet; they will probably emerge once we have a good understanding of what we generally want to extract from symbol files. However, in any case, ultimately it should at least expose the following fundamental types of debugging information:

  • Global symbols
  • Local variable symbols
  • Metadata about compilation units (Files that were used to compile the executable)
  • Sequence points (Mappings from offset ranges to source code line numbers)

This generalization is especially important for consumers that process .NET binaries, as .NET binaries may use both Windows PDB as well as the new PortablePdb file format.

AsmResolver.Symbols.Pdb

This package is meant for reading and writing PDB files that follow the Windows PDB v7 (and possibly legacy v2) file format. This format is used a lot by compilers targeting the Windows platform (such as MVC++, LLVM-based compilers targeting Windows, and legacy VB and C# compilers).

We will not rely on the official implementations (such as the DIA SDK) by Microsoft. This is important as AsmResolver targets netstandard 2.0 with the intention that it should run on any platform, including non-Windows platforms. The package will therefore be fully responsible for reading and writing PDB files in this format.

A PDB file in this format is stored as a Multi-Stream-Format (MSF) file. Since this is a completely different file format from PE, this package will be almost self containing and not rely on other existing packages in AsmResolver other than AsmResolver.Symbols. This means all models required to represent PDB files using this format will be defined in here, as well as code that implement the parsers and writers for it. Furthermore, the package will be designed in such a way that the PDB models are built on top of the MSF models, similar to how PEImage is built on top of PEFile. This keeps general usage of the API intuitive, while still keeping access to lower level structures, should certain constructions (such as metadata streams for which the exact purpose or format is unknown) be preserved upon rebuilding the file.

AsmResolver.Symbols.PortablePdb and AsmResolver.PE

This package is meant for reading and writing PDB files that follow the new Portable PDB file format introduced in .NET Core, and is now the standard format used by modern VB and C# compilers.

Contrary to AsmResolver.Symbols.Pdb, this package will not contain raw models for the actual PDB file format. This may sound counter intuitive at first, but this is because portable PDB is in fact a direct extension of the .NET PE file format as specified in ECMA-335 itself. In particular, the extension defines mostly new tables within the already existing metadata model used in the tables stream (#~) of the binary, and thus take the form of rows in new metadata tables. As such, to have a more streamlined and consistent API, the reading and writing of the individual models will be directly integrated into AsmResolver.PE (specifically the AsmResolver.PE.DotNet namespace). This should be pretty straight forward. Most of the parsing infrastructure for PE files and metadata streams is already in place. We simply have to define a new PdbStream representing the #Pdb metadata stream, extend types such as TablesStream and TableIndex types, and define new Row structures to represent all new models defined by this format.

The sole purpose of AsmResolver.Symbols.PortablePdb that remains is to implement the interfaces defined by AsmResolver.Symbols, in such a way that AsmResolver.PE is not dependent on AsmResolver.Symbols. This avoids huge amounts of code being pulled in by consumers that want to use AsmResolver.PE or AsmResolver.DotNet but have no need for symbol processing.

Open Design Questions

  • How do we "integrate" this into AsmResolver.DotNet, if at all? Ideally, we want it to be as easy as possible to attach debug information to IMetadataMembers, but this may introduce tighter coupling that we really want to avoid.

Related

#59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant