Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343

DeamonSpawn · 2022-06-11T14:30:22Z

Coverage-based guidance mechanism for code generation, mutation, and seed selection tasks to optimize coverage growth. Utilizes the Multi-Armed Bandit Algorithms to navigate the search spaces for selection of tasks.

Addressing issue #172 .

Thesis with design and implementation:
Intelligent Code Generation/ Mutation to aid fuzzing of JavaScript engines

saelo

That's amazing, thanks! A warning up front, this is quite a large PR so it'll take me some time to get through it :D

Do you want me to run some (more) fuzzing sessions (against v8) for evaluation with this enabled vs disabled? I could probably do a few sessions up to 1B iterations (each time enabled vs. disabled). Would it make sense to test e.g. only MABCorpus or CodeGeneration at a time or is it ok to enable all MAB-ed "things" at the same time?

I think the first step towards merging this is to split up the PR into multiple smaller ones. I could imagine:

1-2 PRs for various unrelated fixes (see e.g. comments)
1 PR for the CodeGenerator changes (the CodeGenerationMode)
Either one big PR for the rest, or one PR per MAB "thing". I'm not yet sure which makes more sense

I think it'd also be good to have a short, high-level description of how the MAB algorithm works somewhere in the code, similar to https://github.com/googleprojectzero/fuzzilli/blob/main/Sources/Fuzzilli/Corpus/MarkovCorpus.swift (and also link to the paper of course!). Would it be possible to have one "generic" MAB implementation that is then used for the corpus, the code generators, and the mutators? From a quick look it seems like at least some of the MAB-related logic is duplicated in a few places. Probably it's not going to be completely generic, but maybe it'll be good enough with the right abstractions. WDYT?

saelo · 2022-07-01T11:02:33Z

Cloud/GCE/push.sh

@@ -7,5 +7,5 @@ set -e

 source config.sh

-docker tag fuzzilli gcr.io/$PROJECT_ID/$CONTAINER_NAME
-docker push gcr.io/$PROJECT_ID/$CONTAINER_NAME
+docker tag $CONTAINER_NAME:latest $REGION-docker.pkg.dev/$PROJECT_ID/fuzzilli-docker-repo/$CONTAINER_NAME


I'd prefer if changes to these files could be a separate CL (if you want to include them) :)

I'll revert this change it is not relevant to the PR. :)

saelo · 2022-07-01T11:08:54Z

Sources/FuzzilliCli/main.swift

-for fuzzer in instances {
-    fuzzer.sync {
-        fuzzer.start(runFor: numIterations)
+let master = fuzzer


Ah, is this an independent bug fix? Would the thread workers otherwise not get the initial corpus? This should probably be it's own PR as well then

The reason this is done, is to synchronize the MAB state and the corpora across each worker.
Without this change, only interesting programs would be distributed to the worker nodes and not the MAB state and compiled seeds.

DeamonSpawn · 2022-07-07T16:25:52Z

To answer your first question,
I have performed tests for a single node setup of Fuzzilli without any distributed instances (no master-worker network or threading nodes) and no compiled/imported seeds used in the corpus.
The tests where evaluated over 24 hours.
I have conducted tests with the MAB implementation against instances with the basic (default) corpus and the Markov corpus respectively.
These tests include MAB Code Gen\Mutator and MAB Corpus with their combined and individual contributions.

Summary Observations:
The performance of each baseline instances (Basic and Markov) and MAB implementations (MAB Code Gen\Mutator and MAB Corpus) eventually converges after a period of 24 hours to the same rate of coverage discovery.
In the isolated evaluation of MAB Code Gen\Mutator, the instance is found to reach the convergence point faster than Basic and Markov.
In the isolated evaluation of MAB Corpus, the performance against baseline instances shows a faster coverage growth rate with higher coverage discovered in early time intervals before convergence.
With the combined performance of MAB CodeGen\Mutator and MAB Corpus the higher coverage growth rate and convergence rate show an improved exploration of the corpus search space reducing the time taken for coverage discovery

I am waiting on the assessment of my Masters thesis before I upload my report with details to my implementation.

Networked nodes have been tested locally and are very much capable of operating with MAB Corpus.
However, stats have not been collected for distributed nodes over 24 hours.

DeamonSpawn · 2022-07-07T16:35:17Z

Regarding the split of the PR. I can create the following 5 PRs:

Separation of Code Generation and Splicing (Separation of Code Generation and Splicing #346)
MAB for Mutator selection only (MAB for Mutator selection only #348)
MAB for Code Genarator selection in combination with Mutator selection
Changes in Corpus Protocol to allow compiled seeds
MAB Corpus implementation

DeamonSpawn · 2022-07-13T12:02:06Z

I have updated the original comment with the link to my thesis.

DeamonSpawn force-pushed the mab-final branch from 00568fa to 94ab474 Compare June 22, 2022 14:13

Intelligent Selection with Multi-Armed Bandit

d5b8e7a

DeamonSpawn force-pushed the mab-final branch from 94ab474 to d5b8e7a Compare June 22, 2022 17:35

saelo reviewed Jul 1, 2022

View reviewed changes

Revert file

1febeb4

Revert change seperately addresed in PR googleprojectzero#347

70d3008

saelo force-pushed the main branch from 510c80f to 59d96b0 Compare January 30, 2023 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343

Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343

DeamonSpawn commented Jun 11, 2022 •

edited

saelo left a comment

saelo Jul 1, 2022

DeamonSpawn Jul 7, 2022

saelo Jul 1, 2022

DeamonSpawn Jul 7, 2022

DeamonSpawn commented Jul 7, 2022

DeamonSpawn commented Jul 7, 2022 •

edited

DeamonSpawn commented Jul 13, 2022

Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343

Are you sure you want to change the base?

Intelligent Selection of Code Generation, Mutation and Seed Selection with Multi-Armed Bandit #343

Conversation

DeamonSpawn commented Jun 11, 2022 • edited

saelo left a comment

Choose a reason for hiding this comment

saelo Jul 1, 2022

Choose a reason for hiding this comment

DeamonSpawn Jul 7, 2022

Choose a reason for hiding this comment

saelo Jul 1, 2022

Choose a reason for hiding this comment

DeamonSpawn Jul 7, 2022

Choose a reason for hiding this comment

DeamonSpawn commented Jul 7, 2022

DeamonSpawn commented Jul 7, 2022 • edited

DeamonSpawn commented Jul 13, 2022

DeamonSpawn commented Jun 11, 2022 •

edited

DeamonSpawn commented Jul 7, 2022 •

edited