Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DeamonSpawn
Copy link
Contributor

@DeamonSpawn DeamonSpawn commented Jun 11, 2022

Coverage-based guidance mechanism for code generation, mutation, and seed selection tasks to optimize coverage growth. Utilizes the Multi-Armed Bandit Algorithms to navigate the search spaces for selection of tasks.

Addressing issue #172 .

Thesis with design and implementation:
Intelligent Code Generation/ Mutation to aid fuzzing of JavaScript engines

Copy link
Collaborator

@saelo saelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's amazing, thanks! A warning up front, this is quite a large PR so it'll take me some time to get through it :D

Do you want me to run some (more) fuzzing sessions (against v8) for evaluation with this enabled vs disabled? I could probably do a few sessions up to 1B iterations (each time enabled vs. disabled). Would it make sense to test e.g. only MABCorpus or CodeGeneration at a time or is it ok to enable all MAB-ed "things" at the same time?

I think the first step towards merging this is to split up the PR into multiple smaller ones. I could imagine:

  • 1-2 PRs for various unrelated fixes (see e.g. comments)
  • 1 PR for the CodeGenerator changes (the CodeGenerationMode)
  • Either one big PR for the rest, or one PR per MAB "thing". I'm not yet sure which makes more sense

I think it'd also be good to have a short, high-level description of how the MAB algorithm works somewhere in the code, similar to https://github.com/googleprojectzero/fuzzilli/blob/main/Sources/Fuzzilli/Corpus/MarkovCorpus.swift (and also link to the paper of course!). Would it be possible to have one "generic" MAB implementation that is then used for the corpus, the code generators, and the mutators? From a quick look it seems like at least some of the MAB-related logic is duplicated in a few places. Probably it's not going to be completely generic, but maybe it'll be good enough with the right abstractions. WDYT?

@@ -7,5 +7,5 @@ set -e

source config.sh

docker tag fuzzilli gcr.io/$PROJECT_ID/$CONTAINER_NAME
docker push gcr.io/$PROJECT_ID/$CONTAINER_NAME
docker tag $CONTAINER_NAME:latest $REGION-docker.pkg.dev/$PROJECT_ID/fuzzilli-docker-repo/$CONTAINER_NAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if changes to these files could be a separate CL (if you want to include them) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this change it is not relevant to the PR. :)

for fuzzer in instances {
fuzzer.sync {
fuzzer.start(runFor: numIterations)
let master = fuzzer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is this an independent bug fix? Would the thread workers otherwise not get the initial corpus? This should probably be it's own PR as well then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is done, is to synchronize the MAB state and the corpora across each worker.
Without this change, only interesting programs would be distributed to the worker nodes and not the MAB state and compiled seeds.

@DeamonSpawn
Copy link
Contributor Author

To answer your first question,
I have performed tests for a single node setup of Fuzzilli without any distributed instances (no master-worker network or threading nodes) and no compiled/imported seeds used in the corpus.
The tests where evaluated over 24 hours.
I have conducted tests with the MAB implementation against instances with the basic (default) corpus and the Markov corpus respectively.
These tests include MAB Code Gen\Mutator and MAB Corpus with their combined and individual contributions.

Summary Observations:
The performance of each baseline instances (Basic and Markov) and MAB implementations (MAB Code Gen\Mutator and MAB Corpus) eventually converges after a period of 24 hours to the same rate of coverage discovery.
In the isolated evaluation of MAB Code Gen\Mutator, the instance is found to reach the convergence point faster than Basic and Markov.
In the isolated evaluation of MAB Corpus, the performance against baseline instances shows a faster coverage growth rate with higher coverage discovered in early time intervals before convergence.
With the combined performance of MAB CodeGen\Mutator and MAB Corpus the higher coverage growth rate and convergence rate show an improved exploration of the corpus search space reducing the time taken for coverage discovery

I am waiting on the assessment of my Masters thesis before I upload my report with details to my implementation.

Networked nodes have been tested locally and are very much capable of operating with MAB Corpus.
However, stats have not been collected for distributed nodes over 24 hours.

@DeamonSpawn
Copy link
Contributor Author

DeamonSpawn commented Jul 7, 2022

Regarding the split of the PR. I can create the following 5 PRs:

@DeamonSpawn
Copy link
Contributor Author

I have updated the original comment with the link to my thesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants