Skip to content

Test Flakes

Zhonghu Xu edited this page Jan 19, 2022 · 4 revisions

Flaky tests are tests that gives inconsistent results when run multiple times. For example, a test may fail 1% of the time, but pass all other times.

Flaky test sources

Test flakes can be caused by a variety of issues:

  1. Bugs in the application code. This means the test did its job and found a bug!
  2. Bugs in the test code. This means the test is likely poorly written or makes incorrect assumptions.
  3. Bugs in the infrastructure or dependencies. This means something is wrong with the CI environment or dependencies. For example, a node randomly dies and breaks all tests running on that node, the network breaks, a dependency like GitHub is down, etc.

Unless proven otherwise, test flakes should be assumed to be case 1, and treated with a high priority as such. Historically, we have seen tests that fail 0.01% of the time that represented real, critical bugs in our code. A test failure is giving an important signal, even if a rerun makes the test pass, and should not be ignored

Flaky test causes

Timing dependencies

Tests that rely on timing are sure to experience flakes eventually. For example, consider the test:

go runSomeCode()
time.Sleep(time.Second) // 1s ought to be enough for it to finish
if something { t.Fatalf("something went wrong") }

While this may pass when run locally, and even seem to work in CI, it will likely eventually fail. Our CI runs on shared nodes, which may occasionally experience extreme throttling. Multi second pauses are not uncommon; some tests have even been shown to take over 15s to sign a single RSA key.

The obvious fix here is to bump our sleep up to something extremely high, such as 10s. However, this is also a bad idea - now all of our tests will be slow.

The correct fix is to poll for success. For example, the above example could be rewritten:

complete := make(chan struct{})
go func (){ 
  runSomeCode()
  close(complete)
}
<-complete // Wait for it to finish. if this may never finish, we should add a timeout here
if something { t.Fatalf("something went wrong") }

Or

go runSomeCode()
retry.UntilOrFail(t, CodeHasCompletedFunction) // poll until CodeHasCompletedFunction returns true
if something { t.Fatalf("something went wrong") }

Both of these are far less likely to experience flakes, and will run far faster as there will not be arbitrary delays in tests.

Port conflicts

Tests may run in parallel, which may result in port conflicts. Wherever possible, tests should avoid listening or real network interfaces. For example, gRPC tests can use a buffcon and http tests can use httptest.

When binding to a real port is required, always prefer binding to port 0, which will automatically allocate a free port, where possible.

If you really cannot do either of these, the reserveport.PortManager can help reserve a port. However, this function is inherintly racy; while it may reduce test flakes it does not eliminate them. As such, it should be considered a last resort.

Reproducing test flakes

For rare issues, it may be difficult to reproduce locally by just running go test, or even go test -count X.

For local reproduction, it is recommended to run tests many times in parallel using stress or @howardjohn's fork with some more features an information.

Install stress (make sure you have GOBIN set):

go install golang.org/x/tools/cmd/stress

Example usage:

go test -c -race ./path/to/test/pkg # Compile the test package, with race enabled
stress ./pkg.test -test.v -test.run NameOfTest

Tips:

  • It is usually best to run tests with -race. This changes the timing of tests substantially and matches what is run in CI.
  • Try to run as few tests as possible by selecting a single one with -test.run
  • Ensure you run the test inside the working directory the test lives. Otherwise, tests relying on testdata will likely fail
  • If the test fails when run in parallel, it may need to be tweaked to support parallel runs. This can be done by ensuring the tests do not write to shared resource such as files (in favor of temporary files) and ports (in favor of random ports). If this is not possible, stress has a -p flag to control concurrency; setting this to 1 should remove these failures but also make it much harder to find flakes.
  • Some flakes may not show up for many minutes. It is best to run for a while to ensure there are no failures.
  • Tests with high timeouts may mask failures. For example, if I run stress for a 9 minutes, but a test is broken but will not fail until a 10 minute timeout, I will not detect the failure
  • Running this may freeze your computer, especially when running on a laptop

Dev Environment

Writing Code

Pull Requests

Testing

Performance

Releases

Misc

Central Istiod

Security

Mixer

Pilot

Telemetry

Clone this wiki locally