Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/fuzzer: corpus progs with non-reproducible coverage #4639

Open
a-nogikh opened this issue Apr 4, 2024 · 3 comments
Open

pkg/fuzzer: corpus progs with non-reproducible coverage #4639

a-nogikh opened this issue Apr 4, 2024 · 3 comments
Labels

Comments

@a-nogikh
Copy link
Collaborator

a-nogikh commented Apr 4, 2024

We regularly notice that syzkaller is not able to reproduce 100% of the accumulated corpus coverage after every restart. The same effect is visible on syzbot: https://syzkaller.appspot.com/upstream/graph/fuzzing?Instances=ci-upstream-kasan-gce-root&Metrics=TriagedCoverage&Months=1

The problem

The actual problem begins much earlier than when we already restart a syzkaller instance. At the fuzzing time, not all new inputs that syzkaller adds to the corpus actually reliably reproduce the signal (coverage) they were thought to.

Here's a small experiment that tries to shed more light on the problem: a-nogikh@a4859a7

  • Clone a program after it was deflaked.
  • After minimization is done, run the minimized and the original program once more and see whether they actually reproduce info.newStableSignal.

I ran it on a local syzkaller instance that had quite a good accumulated corpus (~22K programs), so it only captured the newly found programs -- it's similar to what our syzbot instances do.

Whether the reproduced signal was exactly the same (new = after minimize, old = before minimize).

Name Value
A: signal == target : new false, old false 81
B: signal == target : new false, old true 222
C: signal == target : new true, old false 191
D: signal == target : new true, old true 1240

The non-minimized program gave the same signal in the (B+D)/(A+B+C+D)=84.3% of cases
If the original program was stable, the minimized program was successful in the D/(D+B)=84.8% of cases

So is 84% just the average probability of programs reproducing any coverage after 3 runs?

Looking at the syzbot stats, I also see that the triaged coverage is usually 85-90% of the previous maximum of the corpus coverage.

Whether we have reproduced at least any of the new signal (new = after minimize, old = before minimize).

Name Value
E: signal > 0 : new false, old false 75
F: signal > 0 : new false, old true 215
G: signal > 0 : new true, old false 189
H: signal > 0 : new true, old true 1255

The values look very very similar to the previous table.
So, in almost all cases, we either reproduce all of the new coverage, of none of it?

What do we do?

If the reproduction probability is as high as 80+%, it does not feel like we should discard such programs (or try to avoid their reaching the corpus in the first place).

At the same time, we don't want to retry every program too many times -- corpus triage already takes 1-2-3 hours on our syzbot instances. Adding more iterations would only increase it.

@a-nogikh a-nogikh added the bug label Apr 4, 2024
@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 4, 2024

If we do 4 runs in triageJob.deflake():

Name Value
A: signal == target : new false, old false 67
B: signal == target : new false, old true 139
C: signal == target : new true, old false 115
D: signal == target : new true, old true 844

The non-minimized programs have reproduced their coverage 84.5% of times.

If we do 5 runs in triageJob.deflake():

Name Value
A: signal == target : new false, old false 150
B: signal == target : new false, old true 513
C: signal == target : new true, old false 423
D: signal == target : new true, old true 3203

That's 86%.

So adding more runs doesn't change the ratio much. I assume then, the majority of inputs just behave this way and by running them more times in triage we just pick the most lucky, but not the most stable ones?


Some more calculations. Since I already had a corpus, my data is somewhat skewed towards the less stable inputs -- the stable ones are already in the corpus.

So let's assume that the actual figure is higher and inputs reproduce with a 90% probability. From being a corpus.db seed to being a triaged corpus item each input must successfully run 4 times: once as a candidate and then 3 times in deflake(). That gives a total 65% probability of success.

We feed all of corpus.db twice, so the final probability of getting each input is 1-(1-0.6561)^2 = 88%. That actually looks quite similar to what we observe on syzbot (e.g.).

Even if the initial probability were 95%, we would lose ~4% of the corpus each restart.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 4, 2024

Another experiment:

Take corpus.db from syzbot, execute these programs as candidates, deflake() new signal with 3 runs (as we do now) and then repeatedly run them to estimate the probability of reproducing info.newStableSignal for each particular input.

I need to accumulate more data (I'll attach a distribution to this comment then), but from what I see now, the median probability is around 90-95%, just like in the calculations above.

UPD:

~400 runs per each triaged corpus prog.

image

  • 52% of corpus progs reproduce coverage with a 95-100% probability.
  • 11% of corpus progs reproduce coverage with 90-95%.
  • 8% of corpus progs reproduce coverage with 85-90%.
  • 9% of corpus progs reproduce coverage with 80-85%.
  • 4.5% of corpus progs reproduce coverage with 70-75%.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 8, 2024

Some data from a local instance that was running for several days.

Main sources of flaky (=the one that failed deflake()) coverage:

sendmsg$nl_route: 25.35% (total triaged=4930, flaky signal=7090)
openat: 9.06% (total triaged=3589, flaky signal=4515)
pwritev2: 8.85% (total triaged=1638, flaky signal=3362)
.extra: 10.06% (total triaged=1391, flaky signal=3102)
mkdirat: 6.91% (total triaged=1606, flaky signal=2796)
fallocate: 11.98% (total triaged=935, flaky signal=2477)
syz_mount_image$ext4: 5.20% (total triaged=2056, flaky signal=2430)
sendmmsg$inet: 10.57% (total triaged=1987, flaky signal=2422)
syz_emit_ethernet: 34.10% (total triaged=1557, flaky signal=2181)
close_range: 15.45% (total triaged=1230, flaky signal=1897)
sendfile: 9.13% (total triaged=865, flaky signal=1841)
bpf$PROG_LOAD: 21.54% (total triaged=2052, flaky signal=1819)
madvise: 9.50% (total triaged=1474, flaky signal=1806)
pread64: 22.15% (total triaged=1133, flaky signal=1554)
connect$inet: 7.55% (total triaged=1046, flaky signal=1381)
mmap: 7.12% (total triaged=1082, flaky signal=1322)
syz_mount_image$udf: 5.41% (total triaged=1183, flaky signal=1306)
syz_mount_image$hfsplus: 3.78% (total triaged=1086, flaky signal=1244)
syz_mount_image$vfat: 4.03% (total triaged=1068, flaky signal=1231)
connect$inet6: 8.40% (total triaged=1059, flaky signal=1230)
getsockopt$inet_sctp6_SCTP_SOCKOPT_CONNECTX3: 4.11% (total triaged=1046, flaky signal=1229)
syz_mount_image$ntfs3: 2.60% (total triaged=923, flaky signal=1139)
ioctl$FITRIM: 26.40% (total triaged=250, flaky signal=1125)
ioctl$sock_inet_SIOCSIFFLAGS: 13.16% (total triaged=585, flaky signal=1074)
unshare: 15.45% (total triaged=220, flaky signal=1073)
mount$9p_fd: 22.24% (total triaged=652, flaky signal=1067)
read$FUSE: 19.41% (total triaged=876, flaky signal=1062)
mknodat: 12.01% (total triaged=791, flaky signal=1042)
ioctl$SIOCSIFMTU: 15.24% (total triaged=164, flaky signal=994)
sendmmsg$inet6: 18.10% (total triaged=884, flaky signal=985)
sendmsg$IPSET_CMD_SAVE: 1.65% (total triaged=182, flaky signal=964)
syz_mount_image$btrfs: 2.35% (total triaged=638, flaky signal=950)
openat$cdrom: 0.46% (total triaged=217, flaky signal=939)
syz_mount_image$xfs: 4.03% (total triaged=447, flaky signal=935)
syz_mount_image$hfs: 4.13% (total triaged=678, flaky signal=882)

%% is the share of successful triageJob() for new signal for the particular call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant