Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signature recognition #769

Open
wants to merge 7 commits into
base: Signature-recognition
Choose a base branch
from

Conversation

uxmal
Copy link
Owner

@uxmal uxmal commented Apr 11, 2019

No description provided.

@uxmal
Copy link
Owner Author

uxmal commented Apr 11, 2019

@Cairn23: first of all, thanks for the contribution, especially of such an ambitious new feature. It will be a great addition to Reko.

It is the nature of code reviews that they are critical in tone. This may be discouraging to newcomers to the project, who aren't familiar with the coding style and design of the Reko project. Please take the code review in the spirit it is offered. I encourage you to be curious about the review feedback and participate in discussion to clarify coding and design choices. Ultimately, we're all working to improve the code base.

This is a large PR, with many file modifications. This makes the reviewing of the code changes more challenging. It also means that the review process will take longer, perhaps a couple of days. Please have patience as we work through the files.

When submitted, a PR must be mergeable with the target branch (in this case, master). This is currently not the case. I encourage you to do a git pull + merge to get the latest changes of master into your Signature-recognition branch.

When submitted, a PR must furthermore pass all unit tests and pass all the tests in the regression suite (under $REKO/subjects). Currently this is not the case. Once you've fixed the merge conflics, please run all the unit tests and regression tests to make sure no failures have been introduced.

One specific note: a big refactoring of the ElfLoader.cs file was done, but with code changes rolled in. This makes it much harder to review the changes. In the future, please try to make refactorings (that don't change functionality) in separate commits from feature/bugfix commits.

Thanks again, and stay tuned as I work through this PR.

@uxmal uxmal changed the base branch from master to Signature-recognition April 11, 2019 13:20
Copy link
Owner Author

@uxmal uxmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cairn23: I've added comments to this very large PR. Because of the sheer size of it, it's unlikely that I can merge it into master immediately. Instead, please go through the changes requested and address them. If you have questions or concerns, you can reach me on gitter. Perhaps we should start a private chat there to avoid spamming the general Reko channel.

My greatest concern is that there are no tests to validate anything related to the byte signatures. Before I can merge this to master, I want to see unit tests testing:

  • loading byte signatures from a (simulated) file
  • saving byte signatures to a file
  • generate byte signatures from an (ELF|COFF|what have you) file with symbolic information in it.
  • after loading + scanning a file, apply signatures and validate that procedure names are changed.

I also need to have a discussion with you regarding how a user would generate and use these signature files. Clearly, there should be a way to use "shared" signature files -- such files would be located near the Reko executales. But I also think that there should be a way for a user to generate "custom" files and place them alongside the Reko project file.

There are other issues to consider, like performance, user configuration etc. But those will be address in other PR's.

This PR is the fruit of a lot of work, congratulations. Now it's time to polish it up!

@@ -0,0 +1,59 @@
============================
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming this file to the md extension and use Markdown. For instance, this could be changed to:

# What are signature files?

src/tools/makesigs/Signatures.txt Show resolved Hide resolved
src/Core/Core.csproj Outdated Show resolved Hide resolved
src/Core/ProcedureBase.cs Outdated Show resolved Hide resolved
src/Core/ProcedureBase.cs Outdated Show resolved Hide resolved
@@ -0,0 +1,398 @@
using Reko.ImageLoaders.Coff;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like duplicated code of the ar loader.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not apart of the project

string sourceFile;
string destFile;

Console.WriteLine("Reko static library signature generator");
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is chatty. Prefer quiet tools that do their jobs inobtrusively.

src/tools/makesigs/Program.cs Outdated Show resolved Hide resolved
src/tools/makesigs/Program.cs Show resolved Hide resolved
@@ -0,0 +1,56 @@
#region License
/*
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this looks like duplicated code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not apart of the project

@uxmal
Copy link
Owner Author

uxmal commented Apr 11, 2019

I've created a new branch Signature-recognition in my repo to which this PR will be merged. I'm expecting work will need to continue on this branch after the PR is accepted; it's not quite ready for master yet.

@uxmal
Copy link
Owner Author

uxmal commented Apr 11, 2019

There were 90 comments in my preliminary pass over the PR. The ball is in your court now, @Cairn23 :)

@ptomin
Copy link
Collaborator

ptomin commented Apr 11, 2019

I do not like idea to change interface of ProcedureBase. I think, better way is adding library procedures to Program.ImageSymbols. New ProvenanceType field can be added to ImageSymbol class

@Cairn23
Copy link

Cairn23 commented Apr 12, 2019

The conflict is noting to do with the work I have done.

@uxmal
Copy link
Owner Author

uxmal commented Apr 12, 2019

The conflict is noting to do with the work I have done.

You refactored ElfLoader.cs, moving pieces of it into two separate files. In the meantime, changes were made to ElfLoader.cs. The changes have to be accurately merged into your new ElfLoader.cs, ElfLoader32.cs and ElfLoader64.cs.

@Cairn23
Copy link

Cairn23 commented Apr 14, 2019

Merge of of Uxmal master with this branch carried out. resolved issues in ELFLoader.cs

@uxmal
Copy link
Owner Author

uxmal commented Apr 15, 2019

I do not like idea to change interface of ProcedureBase. I think, better way is adding library procedures to Program.ImageSymbols. New ProvenanceType field can be added to ImageSymbol class

Pavel can you explain what you don't like about the change. Perhaps there is another way to deal with this?

@ceeac
Copy link

ceeac commented Apr 26, 2019

I have not looked at the PR in detail, but I think using a tree structure for matching the signatures is faster and more flexible. See for example how IDA does it (conceptually):
https://www.hex-rays.com/products/ida/tech/flirt/in_depth.shtml

@uxmal
Copy link
Owner Author

uxmal commented Apr 26, 2019

I have not looked at the PR in detail, but I think using a tree structure for matching the signatures is faster and more flexible. See for example how IDA does it (conceptually):
https://www.hex-rays.com/products/ida/tech/flirt/in_depth.shtml

@ceeac: indeed, using a trie or a suffix array to perform the pattern matching is likely to be performant. However, the scope of this PR is larger than the algorithm used to perform the actual pattern matching. It involves incorporation pattern recognition into the Reko flow, user interface changes, new file formats, etc.

Before this PR can be merged there are some issues left to be resolved. There are some unaddressed review comments (see above). The file format for the signature files needs to be properly documented More seriously, there are no unit tests or regression tests covering the new code. If I wanted to refactor the code to use efficient pattern matching structures like tries or suffix trees, I wouldn't be aware that I was breaking anything.

@Cairn23: what can I do to assist in completing the PR?

@Cairn23
Copy link

Cairn23 commented Apr 27, 2019

You need to understand the concepts before even thing about changing the pattern matching. I have compared this to IDA's Flirt and it is the same speed. Given that that doing a full decomplication of one of my test aps, the signature process is only taking less than a 20th of the overall time, I would not worry. if you attempt to use a trie how are you going to account for wild cards, In the pre work I did with this using an array of signatures I was able to create conditions that would mean other then the first couple of bytes you could completely miss match as every byte would have a wild card option. I did start to look at the unit tests, but found that they did not add any value, the same as all the other loader unit test which were just checking the that Image loader code was working placing the data in to the header, there are nor REAL tests within any of the loaders. If you want the tests, please set the correct standard first, before going at people.

@Cairn23
Copy link

Cairn23 commented Apr 27, 2019

The best thing to do would be to get it added to the master with a beta warning whilst people use it. It is easy to prevent this working by ensuring the signatures are removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants