Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: When AZURE_SYNC_PERIOD is defined as empty, no periodic resync is performed #3965

Open
nojnhuh opened this issue Apr 22, 2024 · 5 comments
Assignees
Labels
bug 🪲 Something isn't working documentation Improvements or additions to documentation
Milestone

Comments

@nojnhuh
Copy link
Member

nojnhuh commented Apr 22, 2024

Version of Azure Service Operator

v2.6.0

Describe the bug
Per the documentation, when AZURE_SYNC_PERIOD is defined but empty in the aso-controller-settings Secret, the default of "1h" should be in effect. In practice, I observed no periodic resyncs within 90 minutes or so.

### AZURE_SYNC_PERIOD
AZURE_SYNC_PERIOD is the frequency at which resources are re-reconciled with Azure when
there have been no triggering changes in the Kubernetes resources.
This sync exists to detect and correct changes that happened in Azure that Kubernetes is not aware about.
BE VERY CAREFUL setting this value low - even a modest number of resources can cause
subscription level throttling if they are re-synced frequently. If nil or empty (`""`), sync period defaults to `1h`.

Tracing where the config is consumed here:

func parseSyncPeriod() (*time.Duration, error) {
syncPeriodStr := envOrDefault(config.SyncPeriod, "1h")
if syncPeriodStr == "" {
return nil, nil
}

To where it is used here:

if i.syncPeriod != nil {
result.RequeueAfter = randextensions.Jitter(i.rand, *i.syncPeriod, 0.25)
}

It seems to me like a defined empty string value will result in no resyncs.

To Reproduce
Steps to reproduce the behavior:

  • Install ASO with an aso-controller-settings Secret including AZURE_SYNC_PERIOD: ""
  • Create any resource.
  • Make a change to the resource outside of ASO.
  • Observe that ASO never reconciles the resource.

The particular use case I was testing when I ran into this was a ManagedClustersAgentPool with autoscaling enabled. Autoscaling events affecting the count of the node pool were never being reflected in the status.count of the ASO resource.

Expected behavior
Both the documentation and actual logic are aligned as to the behavior of the empty string. I'd slightly prefer the implementation be updated to match the docs than the other way around, such that an empty string implies the 1h default. Either way is acceptable to me though.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@matthchr
Copy link
Member

Agree that this seems like a bug, although reading through the code it does seem like setting it to empty string to mean "no re-sync" may be intended:

// SyncPeriod is the frequency at which resources are re-reconciled with Azure
// when there have been no triggering changes in the Kubernetes resources. This sync
// exists to detect and correct changes that happened in Azure that Kubernetes is not
// aware about. BE VERY CAREFUL setting this value low - even a modest number of resources
// can cause subscription level throttling if they are re-synced frequently.
// If nil, no sync is performed. Durations are specified as "1h", "15m", or "60s". See
// https://pkg.go.dev/time#ParseDuration for more details.
//
// This can be set to nil by specifying empty string for AZURE_SYNC_PERIOD explicitly in
// the config.
SyncPeriod *time.Duration

Specifically the last bit.

So this actually looks like a documentation bug to me, more than a behavior bug. Do you think that omitted (== default) being different than specified but empty is awkward/difficult to work with in your context? Can you expand more on the whys of that, if so?

@matthchr
Copy link
Member

matthchr commented Apr 23, 2024

Maybe it would be better if we had 0s mean "do not re-sync", rather than drawing a distinction between omitted and set to empty string. WDYT @theunrepentantgeek and @super-harsh?

@nojnhuh
Copy link
Member Author

nojnhuh commented Apr 23, 2024

It's only easier in CAPZ for empty string to mean "default" because we use envsubst to inject it like this:

stringData:
  AZURE_SYNC_PERIOD: ${AZURE_SYNC_PERIOD:=""}

I haven't found a clean way for us to omit the key entirely while still making it configurable for our users. That's definitely more of a CAPZ problem than an ASO problem though. I'd say to do whatever makes the most sense for ASO, just offered that preference in case there wasn't any other reason to prefer one over the other.

@matthchr matthchr added the documentation Improvements or additions to documentation label Apr 23, 2024
@theunrepentantgeek
Copy link
Member

I don't like the idea of 0s meaning never - because 0s looks like a duration, but it's not. What about using a different sentinel value in the config, like never? We can turn it into something else internally.

@matthchr
Copy link
Member

I don't like the idea of 0s meaning never - because 0s looks like a duration, but it's not. What about using a different sentinel value in the config, like never? We can turn it into something else internally.

We could do a negative duration (which is pretty common for indicating never), but I also don't love that. It might be better than customizing parsing though and/or treating omitted as different than specified-but-empty.

@theunrepentantgeek theunrepentantgeek added this to the v2.8.0 milestone Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working documentation Improvements or additions to documentation
Projects
Development

No branches or pull requests

3 participants