Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reversed arbitrary #44

Open
epilys opened this issue May 16, 2020 · 10 comments
Open

Reversed arbitrary #44

epilys opened this issue May 16, 2020 · 10 comments

Comments

@epilys
Copy link

epilys commented May 16, 2020

Hello,

I've been toying with custom mutators in cargo-fuzz with libfuzzer and it seems to me to support changes to typed data instead of bytes, the reversed arbrirtary operation would be really helpful. Here's the interface I have implemented so far:

/// Define a custom fuzz mutator.
///
/// If `$bytes` exceeds `$max_size`, it will be silently truncated.
///
/// ## Example
/// ```no_run
/// #![no_main]
/// use libfuzzer_sys::{fuzz_target, fuzz_mutator, llvm_fuzzer_mutate};
///
/// fuzz_target!(|data: &[u8]| {
///     let _ = std::str::from_utf8(data);
/// });
///
/// fuzz_mutator!(|data: &mut [u8], max_size: usize| {
///     println!("custom mutator called with data len = {} and max_size = {}", data.len(), max_size);
///     /* call wrapper function of libfuzzer's default mutator */
///     llvm_fuzzer_mutate(data, max_size)
/// });
/// ``` 

The mutator with typed data would be the equivalent:

fuzz_mutator!(|data: &mut T, max_size: usize | {
   loop {
   /* perform changes on `data` */

   if data.arbitrary_size() <= max_size { /* where arbitrary_size is some method from the Trait to calculate size in bytes */
     break;
  }
}
 });

Does this sound like a reasonable approach to you?

@fitzgen
Copy link
Member

fitzgen commented May 18, 2020

So the idea is that this would also involve adding something like a as_arbitrary_bytes method on the Arbitrary trait? And the fuzz_mutator! would construct the T: Arbitrary for you, let you mutate it, and then call as_arbitrary_bytes to give the bytes back to libfuzzer?

And we would want that x == T::arbitary_take_rest(x.as_arbitrary_bytes()) where x: T? (semi-aside: it might be hard to maintain this property for our arbitrary-length-getting functions)

This seems like a nice thing to have, but I haven't totally thought through how it might play out in terms of interaction with the Arbitrary trait, how nice we can keep the ux, and how well things interact and compose in practice.

However, a first step that I feel is safe to make without answering all those unknowns is to add a libfuzzer_sys::fuzz_mutator!(|data: &mut [u8], max_len: usize| { ... }) macro, that just supports [u8] and does not support T: Arbitrary.

@fitzgen
Copy link
Member

fitzgen commented May 18, 2020

@Manishearth do you have thoughts on this, and how we might integrate it smoothly into the Arbitrary trait?

@Manishearth
Copy link
Member

I'm not really sure! I think it's possible, but it might be annoying

@zommiommy
Copy link

This might also be useful also for providing a corpus when using structure-aware fuzzing.

@vorner
Copy link

vorner commented Nov 11, 2020

I was thinking this would be helpful for me too (I'm contemplating on using Arbitrary for something else but fuzzing, though the property of small change in input yielding small change in output sounds beneficial for my use case), as I would like a way to provide some starting inputs for the search. And generating the input bytes by hand isn't going to be exactly ergonomic 😇.

I was thinking if this would be better as a separate crate (eg. Unstructure or FromArbitrary), one that would be possible to derive separately (or not implement by hand if not needed). The derive would of course provide a "matching" implementation.

@mykter
Copy link

mykter commented Feb 27, 2021

This might also be useful also for providing a corpus when using structure-aware fuzzing.

This is my use case as well. I'd like to benefit from the wonderfully clean fuzz harnesses that Arbitrary enables without sacrificing the ability to use a seed corpus.

@bitwave
Copy link

bitwave commented Nov 22, 2021

I would use this for seeding purposes as well...

@bitwave
Copy link

bitwave commented Dec 25, 2021

I started implementing a dearbitrary function in https://github.com/bitwave/arbitrary/tree/revert-mode
Try to create a PR in the next days...

@bitwave
Copy link

bitwave commented Dec 25, 2021

see #94

@evanrichter
Copy link

for a use-case other than corpus seeding ... (which I really want also!!!)

... more intelligent tmin permutations! I imagine permuting over structured simplifications would yield much faster and likely better quality shrinking.

Hmm, maybe we can cheese this without dearbitrary by recompiling the target with a feature flag that would then use tmin logic inside the fuzz_target! macro. Something like:

unstructured bytes in (that you want to minimize) --> structured data (using Arbitrary) --> apply shrinking strategies over the code in the body of fuzz_target!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants