Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exponential backoffs on secret source errors. #732

Merged
merged 5 commits into from May 16, 2024

Conversation

benashz
Copy link
Collaborator

@benashz benashz commented May 8, 2024

Previously, the back off duration was based on a fixed duration + some jitter. This PR introduces exponential back offs for all secret syncing controllers. The back off will be calculated and honored whenever an error is encountered while fetching from a secret source e.g: Vault, HCPVS. The back off configuration is controlled via some new command line arguments:

  -backoff-initial-interval duration
        Initial interval between retries on secret source errors. All errors are tried using an exponential backoff strategy. Also set from environment variable VSO_BACKOFF_INITIAL_INTERVAL. (default 5s)
  -backoff-max-interval duration
        Maximum interval between retries on secret source errors. All errors are tried using an exponential backoff strategy. Also set from environment variable VSO_BACKOFF_MAX_INTERVAL. (default 1m0s)
  -backoff-multiplier float
        Sets the multiplier for increasing the interval between retries on secret source errors. All errors are tried using an exponential backoff strategy. Also set from environment variable VSO_BACKOFF_MULTIPLIER. (default 1.5)
  -backoff-randomization-factor float
        Sets the randomization factor to add jitter to the interval between retries on secret source errors. All errors are tried using an exponential backoff strategy. Also set from environment variable VSO_BACKOFF_RANDOMIZATION_FACTOR. (default 0.5)

Or through the Helm chart values:

    # Backoff settings for the controller manager. These settings control the backoff behavior
    # when the controller encounters an error while fetching secrets from the SecretSource.
    backoffOnSecretSourceError:
      # Initial interval between retries.
      # @type: duration
      initialInterval: "5s"
      # Maximum interval between retries.
      # @type: duration
      maxInterval: "60s"
      # Randomization factor to add jitter to the interval between retries.
      # @type: float
      randomizationFactor: 0.5
      # Sets the multiplier for increasing the interval between retries.
      # @type: float
      multiplier: 1.5

@benashz benashz force-pushed the VAULT-19199/core-support-backoffs-on-sync-error branch 2 times, most recently from 952bb8d to e08748b Compare May 14, 2024 10:49
@benashz benashz changed the title WIP: Use exponential backoffs on secret source errors. Use exponential backoffs on secret source errors. May 14, 2024
@benashz benashz marked this pull request as ready for review May 14, 2024 15:38
@benashz benashz requested a review from a team as a code owner May 14, 2024 15:38
Previously, the back off duration was based on a fixed duration + some
jitter. This PR introduces exponential back offs for all secret syncing
controllers. The back off will be calculated and honored whenever an
error is encountered while fetching from a secret source e.g: Vault,
HCPVS.
@benashz benashz force-pushed the VAULT-19199/core-support-backoffs-on-sync-error branch from e08748b to 6ace089 Compare May 14, 2024 15:41
@benashz benashz requested review from tvoran and thyton May 15, 2024 13:05
Copy link
Contributor

@thyton thyton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my initial pass

Copy link
Member

@tvoran tvoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I wonder if it would make sense to log the backoff settings on startup? Just to make it easier for users to see what's set.

controllers/registry_test.go Outdated Show resolved Hide resolved
controllers/vaultpkisecret_controller.go Show resolved Hide resolved
@benashz
Copy link
Collaborator Author

benashz commented May 16, 2024

👍

I wonder if it would make sense to log the backoff settings on startup? Just to make it easier for users to see what's set.

Thanks! I made that change in 15e14c9. I also added a new Prometheus metric that includes the same info:

vso_runtime_config{backOffInitialInterval="5s",backOffMaxInterval="1m0s",backOffMultiplier="1.50",backOffRandomizationFactor="0.50",clientCachePersistenceModel="direct-encrypted",clientCacheSize="10000",globalTransformationOptions="",maxConcurrentReconciles="100"} 1

@benashz benashz merged commit 17f8448 into main May 16, 2024
38 checks passed
@benashz benashz deleted the VAULT-19199/core-support-backoffs-on-sync-error branch May 16, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants