Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Arbitrary generators? #150

Open
mwhicks1 opened this issue May 17, 2023 · 7 comments
Open

How to use Arbitrary generators? #150

mwhicks1 opened this issue May 17, 2023 · 7 comments

Comments

@mwhicks1
Copy link

I saw that PR #108 that includes Arbitrary generator support, but I didn't see corresponding instructions for how to leverage it.

In particular, I'd like to use values generated by an Arbitrary::<Type> rather than a TypeGenerator::<Type>. I'm not seeing docs on that. Issue #31 and the comments in PR #108 suggest using with_arbitrary... rather than with_type... but I don't see that function in the docs. I wondered if perhaps you needed to #derive a TypeGenerator and could point to an Arbitrary as the source, but I don't see that in the docs either.

@camshaft
Copy link
Owner

The with_arbitrary method is gated on the arbitrary feature:

bolero/bolero/src/lib.rs

Lines 261 to 262 in b1b519e

#[cfg(feature = "arbitrary")]
pub fn with_arbitrary<T: Debug + for<'a> bolero_generator::arbitrary::Arbitrary<'a>>(

We'll need to make it show up in docs and that it requires that feature.

But you can do that in your Cargo.toml:

bolero = { version = "0.9", features = ["arbitrary"] }

and then in the harness

#[derive(Arbitrary)]
struct MyType {
    a: u8,
    b: u8,
    c: u8,
}

#[test]
fn my_test() {
    bolero::check!()
        .with_arbitrary::<MyType>()
        .for_each(|value| { ... });
}

@mwhicks1
Copy link
Author

Thanks! I got that to work.

But: using Arbitrary for my recursive data type seems to be way slower than using a TypeGenerator. Support for the latter was added with the fix for #121, which shows a simple version of my example. Here's what I get with a derived Arbitrary (2 exec/s):

INFO: seed corpus: files: 808 min: 1b max: 2706b total: 171738b rss: 52Mb
#8	pulse  cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 52Mb
#16	pulse  cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 52Mb
#32	pulse  cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 53Mb
#64	pulse  cov: 65 ft: 65 corp: 1/1b exec/s: 2 rss: 53Mb

compared to this, with the derived TypeGenerator (46k exec/s):

INFO: seed corpus: files: 662 min: 1b max: 2670b total: 127250b rss: 52Mb
#818	INITED cov: 281 ft: 1376 corp: 295/48Kb exec/s: 0 rss: 82Mb
#906	REDUCE cov: 281 ft: 1376 corp: 295/48Kb lim: 2670 exec/s: 0 rss: 82Mb L: 7/2670 MS: 1 EraseBytes-
#1086	NEW    cov: 281 ft: 1377 corp: 296/51Kb lim: 2670 exec/s: 0 rss: 84Mb L: 2463/2670 MS: 4 ShuffleBytes-ChangeByte-InsertRepeatedBytes-InsertRepeatedBytes-
...
#184606	REDUCE cov: 285 ft: 1431 corp: 320/58Kb lim: 3858 exec/s: 46151 rss: 665Mb L: 2188/2706 MS: 4 PersAutoDict-CMP-CopyPart-EraseBytes- DE: "\020\000\000\000"-"\373\016\004\000"-
#185035	NEW    cov: 285 ft: 1434 corp: 321/59Kb lim: 3858 exec/s: 46258 rss: 665Mb L: 1280/2706 MS: 3 CrossOver-CopyPart-ChangeBinInt-
^C==69109== libFuzzer: run interrupted; exiting

I can file another issue if that helps, but maybe it's expected for some reason?

@camshaft
Copy link
Owner

Yeah the initial pass on the arbitrary integration is pretty unoptimal...

macro_rules! gen_from_bytes_impl {
.

It does a bunch of allocation and copying between the slices so it could be improved. That being said, 2 exec/s seems extremely low even with those things so there may be other issues.

Can you share what your recursive data type looks like and I can take a look to see if I can fix it.

@camshaft
Copy link
Owner

camshaft commented May 18, 2023

After some digging it looks like we're running into rust-fuzz/arbitrary#144. The size_hint is taking forever to compute. In #151 I have a change that caches the hint, which speeds things up from 2 to about 4000 iterations per second.

In #151, I've also added a depth guard to bolero-generator so we don't blow up the stack on recursive types. That does about 70,000 iterations per second, but looking at the flamegraph we're spending a lot more time inside libfuzzer. This likely means we're bailing a lot on inputs and libfuzzer isn't finding many that satisfy the constraints. I'm still digging in on this to confirm, though.

@camshaft
Copy link
Owner

Ok so the difference between the two seems to be coming from the way arbitrary does depth guarding and how I implemented it in #151. Arbitrary only guards depth once the byte slice is empty (see rust-fuzz/arbitrary#111) whereas I am guarding it all the time. As a result, the arbitrary-generated trees are much deeper. I think the "always guard" approach is a bit more predictable since it will always limit the tree depth to the configured limit in the harness. With the current approach in arbitrary, I was seeing trees with +150 levels, which may not be desirable for all harnesses.

@camshaft
Copy link
Owner

In the case of both arbitrary and bolero-generator I am seeing memory grow and eventually hitting a OOM at 2GB. I need to figure out if this is bolero's issue or something to do with libfuzzer/ASAN.

@mwhicks1
Copy link
Author

Thanks for looking into this! Recursive generators are hard.

I think the "always guard" approach is a bit more predictable since it will always limit the tree depth to the configured limit in the harness.

Where in the harness is this depth configured? IIRC, size-bounding is the best practice for generating recursive structures. I.e., you should generate a random structure up to a particular size, and that size can be bounded by the user or generated itself. I.e., "let list_len = rand(); return list with list_len random elements". Does the "size_hint" act as the length bound here? Apologies for not digging a lot into the code to figure it out myself ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants