Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Arbitrary be biased? #68

Closed
ThomasdenH opened this issue Dec 28, 2020 · 3 comments · Fixed by #179
Closed

Can Arbitrary be biased? #68

ThomasdenH opened this issue Dec 28, 2020 · 3 comments · Fixed by #179

Comments

@ThomasdenH
Copy link
Contributor

I have a question that might be clarified in the documentation. Is Arbitrary supposed to be exactly uniformly distributed? Or is it supposed to be quick? For example, I can generate a number between 0 and 10 by dividing the value of a u8 by 26. With extra computations, the distribution could be made uniform, but is that necessary or desirable?

@nagisa
Copy link
Member

nagisa commented Dec 28, 2020

Arbitrary is not an RNG, so I don't think it makes much sense to impose uniformity requirements on its implementors. Instead most of the implementations will and do focus on generating data in a way that enables faster detection of faults in fuzzing and testing contexts.

@fitzgen
Copy link
Member

fitzgen commented Jan 8, 2021

At the end of the day, this is a library for fuzzing and related activities. That use case is the priority.

That said, we certainly want uniformity (or at least an approximation) for things like Arbitrary for u32 so long as it doesn't have such a cost on fuzzing throughput that the overall fuzzing efficiency is lowered (which is vague and very specific to a particular fuzz target).

An example where we balance these things: https://github.com/rust-fuzz/arbitrary/blob/master/derive/src/lib.rs#L74-L76

@langston-barrett
Copy link

Is Arbitrary supposed to be exactly uniformly distributed?

Also note that it's hard to say what this means for dynamically-sized collections like Vec, or recursive data types. For the latter case, there's Boltzmann sampling, but that seems hard to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants