Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase archiver robustness to transient network issues #74

Open
3 tasks
Tracked by #61
zaneselvans opened this issue Feb 24, 2023 · 0 comments
Open
3 tasks
Tracked by #61

Increase archiver robustness to transient network issues #74

zaneselvans opened this issue Feb 24, 2023 · 0 comments

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Feb 24, 2023

Looking at some of the archiver logging output, the exponential backoff on failure / timeout seemed like was going up to about a 1 minute wait time. I've sometimes experienced network connectivity issues with the federal agency sites that last for longer than that. Right now it seems like about 1 in 20 archiver runs fails due to transient network issues like connection timeouts or closed connections. But another run done 15 minutes or an hour later has no trouble.

It might make sense to back off to as much as 15 or 30 minutes when we're doing the scheduled periodic archiving runs. They'll probably run overnight and could even take a couple of hours to complete without causing any problems.

If the maximum wait time or the multiplier between steps were a parameter, we could still have the CI go quickly, while the scheduled runs can be more robust and leisurely.

Scope

Next steps

@zaneselvans zaneselvans changed the title Increase maximum wait time in exponential backoff? Increase archiver robustness to transient network issues Feb 27, 2023
@zaneselvans zaneselvans added this to the 2023Q2 milestone Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Icebox
Development

No branches or pull requests

1 participant