Improved strategy for dealing with deterministically flaky tests which are order sensitive #125239
Labels
module: ci
Related to continuous integration
module: tests
Issues related to tests (not the torch.testing module)
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Describe the bug
We have a pretty big flaky test problem in PyTorch CI (at time of writing, there are 389 open disables for flaky tests). Based on work done by @zou3519 et al, we have determined a big class of these flaky issues are due to ordering problems: that is to say, the test deterministically passes 100% when run by itself, but it's only failing when run with some other tests before it. We also suspect that test reordering (e.g., due to target determination) makes these tests flaky. Many flaky tests report 50% fail rate, but folks are consistently unable to reproduce.
If it is true that a test only fails when run with some other tests in a particular order, then it should be simple to reproduce problems locally. The bare minimum we need is:
We technically have both of these pieces today. Specifically, the CI logs which tests it is executing from a shard, and using a pytest plugin bundled with https://github.com/asottile/detect-test-pollution you can force tests to execute in a particular error. The rest of the problem is user education and UI.
To give an example, #119747 is a flaky test which folks have investigated, but which doesn't reproduce when you run it by yourself. It does reproduce when you run the same set of tests that ran exactly in CI.
Here is an idealized workflow I am imagining.
We don't have to implement all of the ideal workflow, but in particular, making it easier for people to test (1) is it deterministically failing given a test order and (2) what exact order should I run things in, seems to be important. It also is relatively time consuming (on order of hours) to bisect minimum number of tests that need to be run, so an offline process that can backfill this would also be useful.
Versions
main
cc @seemethere @malfet @pytorch/pytorch-dev-infra @mruberry @ZainRizvi
The text was updated successfully, but these errors were encountered: