-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/fuzzer: corpus progs with non-reproducible coverage #4639
Comments
If we do 4 runs in
The non-minimized programs have reproduced their coverage 84.5% of times. If we do 5 runs in
That's 86%. So adding more runs doesn't change the ratio much. I assume then, the majority of inputs just behave this way and by running them more times in triage we just pick the most lucky, but not the most stable ones? Some more calculations. Since I already had a corpus, my data is somewhat skewed towards the less stable inputs -- the stable ones are already in the corpus. So let's assume that the actual figure is higher and inputs reproduce with a 90% probability. From being a corpus.db seed to being a triaged corpus item each input must successfully run 4 times: once as a candidate and then 3 times in We feed all of Even if the initial probability were 95%, we would lose ~4% of the corpus each restart. |
Another experiment: Take I need to accumulate more data (I'll attach a distribution to this comment then), but from what I see now, the median probability is around 90-95%, just like in the calculations above. UPD: ~400 runs per each triaged corpus prog.
|
Some data from a local instance that was running for several days. Main sources of flaky (=the one that failed
%% is the share of successful |
We regularly notice that syzkaller is not able to reproduce 100% of the accumulated corpus coverage after every restart. The same effect is visible on syzbot: https://syzkaller.appspot.com/upstream/graph/fuzzing?Instances=ci-upstream-kasan-gce-root&Metrics=TriagedCoverage&Months=1
The problem
The actual problem begins much earlier than when we already restart a syzkaller instance. At the fuzzing time, not all new inputs that syzkaller adds to the corpus actually reliably reproduce the signal (coverage) they were thought to.
Here's a small experiment that tries to shed more light on the problem: a-nogikh@a4859a7
info.newStableSignal
.I ran it on a local syzkaller instance that had quite a good accumulated corpus (~22K programs), so it only captured the newly found programs -- it's similar to what our syzbot instances do.
Whether the reproduced signal was exactly the same (new = after minimize, old = before minimize).
The non-minimized program gave the same signal in the
(B+D)/(A+B+C+D)=84.3%
of casesIf the original program was stable, the minimized program was successful in the
D/(D+B)=84.8%
of casesSo is 84% just the average probability of programs reproducing any coverage after 3 runs?
Looking at the syzbot stats, I also see that the triaged coverage is usually 85-90% of the previous maximum of the corpus coverage.
Whether we have reproduced at least any of the new signal (new = after minimize, old = before minimize).
The values look very very similar to the previous table.
So, in almost all cases, we either reproduce all of the new coverage, of none of it?
What do we do?
If the reproduction probability is as high as 80+%, it does not feel like we should discard such programs (or try to avoid their reaching the corpus in the first place).
At the same time, we don't want to retry every program too many times -- corpus triage already takes 1-2-3 hours on our syzbot instances. Adding more iterations would only increase it.
The text was updated successfully, but these errors were encountered: