Coverage Guided / Mutation Based Fuzzing #687

d-xo · 2021-07-02T14:09:57Z

Still very much WIP (and currently based off a pretty outdated commit).

This PR introduces a coverage guided / mutation based approach to fuzzing. All fuzz tests are now run with coverage enabled. We store each path seen, along with the input that we used to get there. When generating calldata for a fuzz run, we sometimes use the existing strategy (random), and sometimes choose instead to mutate one of the examples from the corpus of visited paths.

This should hopefully help us get even deeper inside complex contracts.

Still TODO:

rebase on master
general tidyup
generate coverage reports
use the symbolic execution engine to prefill the corpus with interesting calldata
apply a weighting to some examples (e.g. paths where the symbolic execution engine timed out can be targeted for fuzzing)

…ilter

…speedup

d-xo · 2021-07-02T14:10:56Z

src/hevm/src/EVM/ABI.hs

@@ -134,6 +143,10 @@ data AbiType
  | AbiTupleType        (Vector AbiType)
  deriving (Read, Eq, Ord, Generic)

+instance ToJSON AbiType


I think all these ToJSON/FromJSON instances are not neeeded anymore, they're just leftover from a time when I was serializing the corpus as json.

d-xo · 2021-07-02T14:12:07Z

src/hevm/src/EVM/UnitTest.hs

@@ -101,6 +106,12 @@ data TestVMParams = TestVMParams
  , testChainId       :: W256
  }

+-- | For each tuple of (contract, method) we store the calldata required to
+-- reach each known path in the method
+type Corpus = Map (W256,Text) (Map [(W256, Int)] AbiValue)


W256 is the codehash

d-xo · 2021-07-02T14:18:00Z

src/hevm/src/EVM/UnitTest.hs

+
+type TraceIdState = (VM, [(W256, Int)])
+
+-- | This interpreter is similar to interpretWithCoverage, except instead of


This comment is a lie, the hash accumulator was an old idea that actually ended up slowing things down. However I did end up going with a much faster data structure for the traces. Instead of a MultiSet OpLocation where OpLocation is a Contract and a bytecode index, we store insted a list of (codehash, bytecode index).

Using a codehash keeps the size of the corpus on disk small, and using a list means we can get O(1) insertion of new elements by just consing them onto the head of the list (instead of O(log n) for a MultiSet).

With these optimizations in place the extra overhead introduced by the serilization / instrumentation is somewhere between 10 and 15 percent.

Note that the use of codehash means we'll need some extra state if we want to show pretty coverage reports to the user (perhaps a mapping from codehash to SolcContract or smth in the Corpus?)

we probably want to rewrite the existing coverage based interpreter to used the new data structures before merging, but I was still just trying to move fast and figure out what worked best so I used a seperate one to start with.

oh also it's worth noting that using a MultiSet or a list instead of a Set for the traces means that traces that get further into a loop will be treated as a new trace, which seems like a nice thing.

transmissions11 · 2021-09-04T05:26:54Z

does this include constant mining? crytic/echidna#262

d-xo added 21 commits May 27, 2021 09:45

hevm: add mutators for abi values

535cf8b

hevm: UnitTest: coverage guided fuzzing

662173c

hevm: allow mutations percentage to be set from the command line

ace7bf6

hevm: UnitTest: rework representation of corpus to avoid an uneeded f…

d58267e

…ilter

hevm: attempt serialization of corpus

d6a6294

hevm: UnitTest: skip hashing of traces before corpus insertion

e89fdd0

hevm: test: hlint + whitespace

17df0c4

hevm: test: compiler warnings

45580b7

hevm: test: test json serialization / deserialization routines

5fc658f

hevm: fix & test corpus serialization

85c495f

hevm: fix nix build

f97d256

hevm: UnitTest: reduce on disk corpus size

d42bafb

hevm: UnitTest: use blake3 for hashing corpus keys

10f6d07

hevm: UnitTest: no more word256Bytes

87fec27

hevm: UnitTest: replace coverage traces with a hash accumulator. ~7% …

6286f51

…speedup

hevm: UnitTest: replace blake3 with xxhash (~6% improvement)

d6edbac

hevm: UnitTest: represent traces as a list (640% faster)

fa47e66

hevm: UnitTest: rm hashCall

537fed0

hevm: test: fix tests

8ed2665

hevm: dappTest: serialize corpus with cbor

cba0e7c

hevm: UnitTest: simplify runTest

2fb197a

d-xo commented Jul 2, 2021

View reviewed changes

gakonst mentioned this pull request Jan 17, 2022

Mutation testing foundry-rs/foundry#478

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coverage Guided / Mutation Based Fuzzing #687

Coverage Guided / Mutation Based Fuzzing #687

d-xo commented Jul 2, 2021

d-xo Jul 2, 2021

d-xo Jul 2, 2021

d-xo Jul 2, 2021

d-xo Jul 2, 2021

d-xo Jul 2, 2021

d-xo Jul 2, 2021

transmissions11 commented Sep 4, 2021


		type TraceIdState = (VM, [(W256, Int)])

		-- \| This interpreter is similar to interpretWithCoverage, except instead of

Coverage Guided / Mutation Based Fuzzing #687

Are you sure you want to change the base?

Coverage Guided / Mutation Based Fuzzing #687

Conversation

d-xo commented Jul 2, 2021

d-xo Jul 2, 2021

Choose a reason for hiding this comment

d-xo Jul 2, 2021

Choose a reason for hiding this comment

d-xo Jul 2, 2021

Choose a reason for hiding this comment

d-xo Jul 2, 2021

Choose a reason for hiding this comment

d-xo Jul 2, 2021

Choose a reason for hiding this comment

d-xo Jul 2, 2021

Choose a reason for hiding this comment

transmissions11 commented Sep 4, 2021