-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's amazing, thanks! A warning up front, this is quite a large PR so it'll take me some time to get through it :D
Do you want me to run some (more) fuzzing sessions (against v8) for evaluation with this enabled vs disabled? I could probably do a few sessions up to 1B iterations (each time enabled vs. disabled). Would it make sense to test e.g. only MABCorpus or CodeGeneration at a time or is it ok to enable all MAB-ed "things" at the same time?
I think the first step towards merging this is to split up the PR into multiple smaller ones. I could imagine:
- 1-2 PRs for various unrelated fixes (see e.g. comments)
- 1 PR for the CodeGenerator changes (the CodeGenerationMode)
- Either one big PR for the rest, or one PR per MAB "thing". I'm not yet sure which makes more sense
I think it'd also be good to have a short, high-level description of how the MAB algorithm works somewhere in the code, similar to https://github.com/googleprojectzero/fuzzilli/blob/main/Sources/Fuzzilli/Corpus/MarkovCorpus.swift (and also link to the paper of course!). Would it be possible to have one "generic" MAB implementation that is then used for the corpus, the code generators, and the mutators? From a quick look it seems like at least some of the MAB-related logic is duplicated in a few places. Probably it's not going to be completely generic, but maybe it'll be good enough with the right abstractions. WDYT?
Cloud/GCE/push.sh
Outdated
@@ -7,5 +7,5 @@ set -e | |||
|
|||
source config.sh | |||
|
|||
docker tag fuzzilli gcr.io/$PROJECT_ID/$CONTAINER_NAME | |||
docker push gcr.io/$PROJECT_ID/$CONTAINER_NAME | |||
docker tag $CONTAINER_NAME:latest $REGION-docker.pkg.dev/$PROJECT_ID/fuzzilli-docker-repo/$CONTAINER_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if changes to these files could be a separate CL (if you want to include them) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert this change it is not relevant to the PR. :)
for fuzzer in instances { | ||
fuzzer.sync { | ||
fuzzer.start(runFor: numIterations) | ||
let master = fuzzer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, is this an independent bug fix? Would the thread workers otherwise not get the initial corpus? This should probably be it's own PR as well then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this is done, is to synchronize the MAB state and the corpora across each worker.
Without this change, only interesting programs would be distributed to the worker nodes and not the MAB state and compiled seeds.
To answer your first question, Summary Observations: I am waiting on the assessment of my Masters thesis before I upload my report with details to my implementation. Networked nodes have been tested locally and are very much capable of operating with MAB Corpus. |
Regarding the split of the PR. I can create the following 5 PRs:
|
I have updated the original comment with the link to my thesis. |
Coverage-based guidance mechanism for code generation, mutation, and seed selection tasks to optimize coverage growth. Utilizes the Multi-Armed Bandit Algorithms to navigate the search spaces for selection of tasks.
Addressing issue #172 .
Thesis with design and implementation:
Intelligent Code Generation/ Mutation to aid fuzzing of JavaScript engines