Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nix] adjust pytest retrys #4558

Merged
merged 10 commits into from
May 29, 2024
Merged

[Nix] adjust pytest retrys #4558

merged 10 commits into from
May 29, 2024

Conversation

dougch
Copy link
Contributor

@dougch dougch commented May 15, 2024

Resolved issues:

part of #3841

Description of changes:

Several tests were failing under nix on arm (e.g. happy_path), but upon investigation, the pytest arguments for nix were less forgiving than our other ci jobs.

This PR aligns the default pytest retry count, from 0 to 2 globally.

These changes address the following test failures:

The following tests FAILED:
        271 - integrationv2_happy_path (Failed)
        275 - integrationv2_ocsp (Failed)
        277 - integrationv2_record_padding (Failed)
        278 - integrationv2_renegotiate (Failed)
        279 - integrationv2_session_resumption (Failed)
        280 - integrationv2_signature_algorithms (Failed)
        282 - integrationv2_version_negotiation (Failed)

An ad-hoc job running just the above on both architectures is here

Call-outs:

Unfortunately, a few tests need more retries than just 2 or a pause between retries. Iterating through these failures, I've bumped up the retry count/delay using pytest rerunfailures flaky decorator, but only on specific tests while running on arm.

For renegotiate, I also observed many of the retries were from us hitting the sub-process timeout of 5 seconds. Experimenting with huge timeouts led to landing on 8 seconds as a more reliable timeout.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? local/ci

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label May 15, 2024
@dougch dougch force-pushed the nix_retries branch 5 times, most recently from 5ee7ea9 to dc33e28 Compare May 21, 2024 19:48
@dougch dougch changed the title Nix pytest retrys should match the other CI job [Nix] adjust pytest retrys May 21, 2024
@dougch dougch marked this pull request as ready for review May 22, 2024 21:39
@dougch dougch requested review from jmayclin and goatgoose May 22, 2024 21:39
@@ -243,6 +244,7 @@ def test_tls13_session_resumption_s2n_client(managed_process, cipher, curve, cer
b'SSL_accept:SSLv3/TLS write certificate') == num_full_connections


@pytest.mark.flaky(reruns=7, reruns_delay=2, condition=platform.machine().startswith("aarch"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the errors when these tests fail due to flakiness? It seems like the tests could be broken if they require 5 or 7 retries in order to succeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current working theory is that this is the test framework/interactions with subprocesses. I bumped the timeout values from 5 seconds to 60, and then noticed that the retried tests count exactly matches the number of tests running for 60 seconds. So these are being timed out by pytest, unclear why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting ok. I guess the concern is that if we make a change that causes one of these tests to become flaky, we likely won't notice it since they're retrying so many times. But I guess the extra retries only apply to arm, so as long as we continue running them in an environment with retries disabled we should notice a flaky regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, we're running these now on x86 with 2 retires, so there is no CI running these specific integrationv2 tests with 0 retires. I don't believe we're collecting retry metrics, which would be an interesting datapoint...

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
@dougch dougch requested a review from goatgoose May 23, 2024 19:20
CMakeLists.txt Show resolved Hide resolved
@dougch dougch enabled auto-merge (squash) May 29, 2024 18:14
@dougch dougch added the type/nix related to nix label May 29, 2024
@dougch dougch merged commit 622bcd3 into aws:main May 29, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s2n-core team type/nix related to nix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants