Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure retry loop #602

Open
mbwhite opened this issue May 26, 2022 · 1 comment
Open

Configure retry loop #602

mbwhite opened this issue May 26, 2022 · 1 comment

Comments

@mbwhite
Copy link
Collaborator

mbwhite commented May 26, 2022

In the advent of a network error, or timeout on REST calls, the modules calling the IBP API, will retry up to a maximum of 5 attempts.
This is not configurable; which can result in errors.

This needs to be made configurable for the components in question.

@mbwhite
Copy link
Collaborator Author

mbwhite commented Jun 9, 2022

Documentation

When requests are made to IBP (or another provider) via a network (eg REST) request there are two important timeouts, and retry settings.

Any individual REST call has a timeout after which the call is aborted, and the task fails. This is set via the 'api_timeout' value and is in seconds.

When creating resources such as CA, Peers, Ordering Services etc. when the initial API call for the create has returned ok, there is still processing occurring on the node. Therefore the tasks will check the health status of each node. the time to wait for a successful response is the 'wait_timeout'

Summary: the api_timeout is the timeout on the initial API call, eg createCertificateAuthority, and the wait_timeout is the time to wait for good health status to be reported.

In addition, there is a built-in retry loop that will attempt to retry the initial API call (eg createCertiticateAuthority) if it fails for something like a network error. Intended to cope with transient errors. In previous Ansible Collection releases, this has been set at 5; It does retry as well in the case of timeouts. This can cause issues with multiple resources being created if only display_name (that is not unique) is specified and a specific timing window is hit. The number of retries can be set with 'error-retries' - and is recommended now to set this to 0. (this will become the default in the future).

If something does fail for transient reasons, the best approach is to rerun the Ansible script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant