Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test intermittent failures #1067

Open
briantist opened this issue Sep 20, 2023 · 2 comments
Open

Integration test intermittent failures #1067

briantist opened this issue Sep 20, 2023 · 2 comments
Labels
CI/CD related to CI/CD (not necessarily tests) developer experience Developer setup and experience help wanted Contributions welcome! tests related to tests (not necessarily CI/CD)

Comments

@briantist
Copy link
Contributor

briantist commented Sep 20, 2023

Example:

These come up from time to time, and typically re-running the tests works fine.

The issue seems to be that the tests delete a value from Vault, then check to ensure it was deleted. Sometimes, the value was "unexpectedly found" when checked.

My guess is that this is a sort of race condition; Vault is accepting the delete request but actually has not deleted it yet, and the next request comes so quickly that it actually returns the value before deletion happens.

If so, we could solve it in tests by either delaying before checking, or retrying the check on failure (that is, retrying the request if it succeeds and returns the value we expect should not be there).

I prefer the retry mechanism rather than a sleep.

If we still see failures with a retry, then we might have a different problem, where intermittently the delete request itself never gets to Vault or is never acted on. Seems unlikely, but we won't know until we do some retries.

I don't think I've ever seen this running the tests locally, but it's possible that it could happen that way too.

@briantist briantist added help wanted Contributions welcome! CI/CD related to CI/CD (not necessarily tests) tests related to tests (not necessarily CI/CD) developer experience Developer setup and experience labels Sep 20, 2023
@erickisos
Copy link

Looks like an interesting case, do you know if this happens to multiple tests or is always the same flaky one?

@briantist
Copy link
Contributor Author

Hi @erickisos , unfortunately I don't remember, I didn't record previous ones and I don't think I've seen any new ones since creating this issue. But it can only occur for tests where we make another call to Vault.

So in this search: https://github.com/search?q=repo%3Ahvac%2Fhvac+self.assertNotIn+path%3A%2F%5Etests%5C%2Fintegration_tests%5C%2F%2F&type=code

The asserts where one of the operands is some call that reaches out to Vault are the ones that would be susceptible.

The specific call in the CI run in this issue is this one:


One way that we might be able to add retries is with the method described here:
https://hvac.readthedocs.io/en/stable/advanced_usage.html#retrying-failed-requests

But it might have to be used selected cases only because we'd tweaking things like the backoff and number of retries, and in particular, we'd be trying on successful response codes and not failures like usual. Bit of a weird situation!

Anyway just an idea, thanks for your interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD related to CI/CD (not necessarily tests) developer experience Developer setup and experience help wanted Contributions welcome! tests related to tests (not necessarily CI/CD)
Projects
None yet
Development

No branches or pull requests

2 participants