Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsmResolver.Workspaces #298

Open
Washi1337 opened this issue Apr 18, 2022 · 0 comments
Open

AsmResolver.Workspaces #298

Washi1337 opened this issue Apr 18, 2022 · 0 comments
Labels
enhancement workspaces Issues related to the Workspaces extensions of AsmResolver

Comments

@Washi1337
Copy link
Owner

Washi1337 commented Apr 18, 2022

AsmResolver.Workspaces

Overview

Currently, AsmResolver is mainly tailored for processing individual PE images. This can be problematic if images have dependenices to other images. Especially with the introduction of .NET Core, it is very typical for a .NET assembly to have many smaller dependencies, or split multiple parts of the program into smaller packages. This makes modifying a definition in one assembly a significantly more involved process, as it may require changing other images that depend on it as well. Consumers of AsmResolver therefore would need a lot of extra code to address these issues. Usually these fixes happen in a very ad-hoc manner as well, and things can easily be forgotten. There is a need for a more rigorous way for processing multiple files; a way that is more aware of the context that a single binary resides in.

This document details some design philosophies of a new potential feature called AsmResolver.Workspaces. The main goal of AsmResolver.Workspaces is to make it easier to analyze and process multiple files at once with relatively minimal code from the user's perspective. It does so by loading binaries into a workspace. This workspace is then indexed, analyzed, and lifted into higher abstractions that expose implicit the relationships between various components of assemblies. The index can then be queried for important information, such as finding metadata objects that are related to each other in some way or another.

The Workspace

A workspace is a collection of input binaries that are in some ways related to each other and need to be analyzed or processed at the same time. All binaries loaded into the workspace are aware of each other, meaning changes in one assembly may automatically propagate to other assemblies as well. Essentially, binaries in a workspace are similar to what projects are in a solution in Visual Studio.

A workspace maintains an Index. This is essentially a (huge!) graph where each node represents a single object put into the workspace, and edges in between nodes encode relationships between these objects. It serves as a knowledge base for anything that is put into the workspace for analysis and processing. It is also how related objects are discovered, no matter how distant the relationship between the two is.

Key Features

The Workspace's main responsibilities and design philosophies can be summarized in the following key points:

  • Object and relation type neutral: The Index can store any type of cross reference that links arbitrary objects together in (multiple) arbitrary ways. This means it is possible to link metadata members to other members, but also to other types of objects/structures such as instructions in a method body or tags in a xaml file. This also means that different analyzers can add or introduce different relations between objects.

  • Easy lookup of objects: Once the workspace has been indexed, it is possible to efficiently obtain metadata about any indexed object (i.e. look up nodes in the graph) with very little extra syntactical overhead / noise.

  • Easy traversal of the Index Graph: Once the workspace has been indexed, it is possible to efficiently traverse all edges in the graph. This means that it is easy to obtain related objects (and by extension, related objects to the related objects and so forth...) in a type safe manner.

  • Shared context for inter-modular symbol resolution: Once assemblies are added to a workspace, they will share the same context for symbol resolution. For .NET modules, this means that they will share the same IAssemblyResolver instance that first looks into the workspace itself for candidate assemblies before it looks into the GAC or other runtime installation directories. Calling Resolve() on e.g. a MemberReference that refers to a method within the same workspace, will then result in a MethodDefinition object originating from that very same AssemblyDefinition object that represented the declaring assembly in the workspace.

Analyzers

To build up an Index, target assemblies need to be analyzed first. To do so, a Workspace maintains the Analysis Queue, as well as a collection of analyzers sorted by object type. When an analysis is initiated, all assemblies put into the workspace are put in the Analysis Queue. Objects are then repeatedly dequeued from this queue, and then dispatched to an analyzer specifically tailored for the type of the object. Analyzers then register objects in the Index and/or link objects within the Index together, and schedule any newly discovered objects for analysis by putting them in the Analysis Queue. The analysis stops when the Analysis Queue is empty.

Typical Analyzers

Analyzers that could be implemented may include but are not limited to:

  • Pairing member references to their respective member definitions in the declaring assembly.
  • Linking overridden methods to their respective virtual base methods.
  • Mapping references in xaml strings to the referenced member definitions.

For example, consider a workspace that defines two assemblies with types in the following manner:

// Assembly 1:
public interface IMyInterface
{
    void M();
}

public class MyClass1 : IMyInterface
{
    public void M() { ... } 
}

public class MyClass2 : IMyInterface
{
    public virtual void M() { ... } 
}

public class MyDerivedClass : MyClass2
{
    public override void M() { ... } 
}

// Assembly 2:
public static void Main()
{
    IMyInterface obj = new MyClass1();
    obj.M();
}

If we zoom in on IMyInterface::M(), then, after the analysis has completed, the index graph may look like the following:

example-index

As can be seen by the edges in the graph, references are paired with definitions, and base members are linked to any overrides that may exist in the workspace. This type of higher level abstractions would then allow for easy look up of relevant metadata in the workspace.
A nice by product of this as well is that it can automatically serve as a cache for member definitions resolved using the Resolve() method.

Typical Anti-Pattern Analyzers

It is important that the Index stays minimal. By this we mean that the Index is only supposed to extend our current knowledge about a certain object, and should not mimic the structure that is already exposed by the object itself. An Index does not and should never try to reinvent the wheel. Therefore, when writing analyzers, the following should be kept in mind:

  • "Basic" properties that are already exposed by the object itself should not be added to the Index graph.
    Analyzers that connect metadata members to type definitions via a is-a-member-in-type relation are redundant, as this property is already directly exposed by the member.DeclaringType property. Analysers that combine multiple properties, however, are allowed. For example, consider an analyzer that connects types together via an is-assignable-to relation. Since this property emerges from both type.BaseType and type.Interfaces, it is a generalization.

  • Flattening of the graph should be avoided.
    An Index should only store direct relationships between objects, and not link objects together based on properties that arise from a transitive property. Transitive properties emerge from the entire structure of the graph, and should not be localized to single nodes. For example, consider again the is-assignable-to analyzer. While values of type System.IO.MemoryStream are indeed assignable to variables of type System.Object, this property is already emerging from the fact that it is also assignable to a System.IO.Stream. Therefore, adding an extra edge from System.IO.MemoryStream to System.Object (as depicted in the left Index graph below) is redundant, and also destroys the hierarchical nature introduced by the inheritance. A more favourable structure would be the Index graph on the right, where System.IO.MemoryStream is indirectly connected to System.Object via is-assignable-to edges.
    is-assignable-to

Analyzer Profiles

In many cases, users probably want to use similar sets of workspace analyzers to build up the Index graph. The idea here is to put related analyzers into named groups of analyzers called Analyzer Profiles. These profiles can then be loaded into the workspace prior to initiating the indexing procedure. The default packages would also define a set of standard analyzer profiles that people can base their set of analyzers on, making it easier to get things started as a user.

Open questions

  • Should it be possible to queue objects for analysis multiple times? If done incorrectly, this can quickly cause infinite loops that seem inexplicable or difficult to debug. However, it maybe be necessary to perform the analysis into multiple phases. If so, what phases should we define, or should this also be something that is up to the user?
  • Should the Analysis Queue be some kind of priority queue? Some analyzers may rely on other analyzers to lay a basic foundation of linkage between other objects. This may mean that some objects need to be analyzed before others. How do we control this?
  • Should we distinguish between simple crawlers and actual analyzers? In this scenario, crawlers would be analyzers that simply schedule the next objects in the Analysis Queue, while the actual analyzers put links between objects. The advantage is here that the same set of crawlers can always be used as a foundation for an analyzer profile, while specific analyzers that actually put links between objects can be included or excluded without having to worry about the coverage of the analysis. A downside however is that this may vastly increase the numbers of analyzers we need to define (effectively doubling).

Motivating Example Use-Cases

Renaming Engine

Perhaps one of the most important thing of a function, method, variable or type, is its name. The name of a symbol (hopefully) conveys the purpose of the symbol, making programs very easy to understand. This however is a double-edged sword; on the one hand, it makes it easier for developers to debug their programs, while on the other hand it is very valuable for a reverse engineer as well to figure out the inner workings of a program. This is why many obfuscators for .NET feature a renaming service that randomize the names of all symbols in the target assembly.

In theory, AsmResolver supports renaming of symbols. After all, all types representing named metadata members expose a mutable Name property. However, just changing the value of this property is not always enough to produce a working assembly. While the .NET metadata file format references internal metadata through metadata tokens (and thus generally does not really worry about the names of these members), references to public symbols defined in external modules are resolved by their name and signature. This is problematic from the perspective of an obfuscator developer that wants to use AsmResolver.DotNet as a metadata back-end. It implies that references need to be kept in sync, which can be forgotten easily. For example, if assembly A defines type T, and assembly B references this type via a TypeReference, then assigning a new name to the TypeDefinition in A representing T would break the semantics of the TypeReference referencing it in B. For a successful renaming of the definition, a (de)obfuscator would therefore need to know which assemblies reference its declaring assembly, and update all the rows in the type reference table to account for the name change. Currently, AsmResolver does not do this automatically (nor is it really able to know what needs to be updated to begin with).

Things become even more problematic if we extend our views and look at assemblies that implement Object Oriented Programming (OOP) principles. The .NET runtime requires that an overridden method within a type has the same name as the virtual method defined in its base class. Furthermore, an assembly may also indirectly reference members by name in a custom attribute's signature, via System.Reflection or by the use of markup languages (e.g. XAML in WPF). A (de)obfuscator therefore needs to be very careful of the fact that renaming these might mean that seemingly unrelated attributes, derivative classes, strings in code or resource files may need to be updated as well.

However, with a workspace index that exposes the inter-modular relationships between all these types of objects, it is possible to simply "look up" all related objects that reference these members either directly or indirectly by name.

Refactoring Engine

Similar to renaming of members, we may also have the need for doing more structural changes. For example, adding or removing methods to/from an abstract class or interface has serious implications on any of its derivative classes. However, if we are able to query the Index for all types that inherit from the base class / interface, this should make things a lot easier.

Scheduling, Logistics and Final Comments

The addition of AsmResolver.Workspaces would pave way for many more higher level applications to be written using AsmResolver, While this is exciting, it would also mean that the project is going beyond the main goal that AsmResolver tries to achieve ("just" reading and writing executable files). With workspaces, we actually step into a new world of adding higher level interpretations to executable images. This is not necessarily completely new as we actually already have been doing this already for a bit (especially with the introductions of e.g. ImpHash and TypeRefHash, as well as static memory layout detection in the AsmResolver.DotNet.Memory namespace). However, this new feature introduces interpretations of a much higher caliber. It may actually grow large enough to even warrant its own repository, separate from the base AsmResolver repository. We'll have to see how things evolve here.

Even though this feature is both exciting and ambitious, because attaching interpretations to metadata is a secondary goal of AsmResolver, it is currently not on the highest priority on the todo list. As a result, it will probably reside on a separate branch for a bit until it will eventually be merged into the main branches. A merge of initial versions of workspaces may therefore be delayed until a major version bump (5.0 may be ambitious but could be possible, more realistic is 6.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement workspaces Issues related to the Workspaces extensions of AsmResolver
Projects
None yet
Development

No branches or pull requests

1 participant