Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Controller Options Overrides one-pager #5557

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

max-melentyev
Copy link

@max-melentyev max-melentyev commented Apr 8, 2024

Description of your changes

We faced an issue with scaling crossplane-contrib/provider-aws for thousands of EC2 instances and route53 records. This PR is a part of the process we plan to address that issues:

Signed-off-by: max-melentyev <119438383+max-melentyev@users.noreply.github.com>
@max-melentyev max-melentyev requested review from negz and a team as code owners April 8, 2024 12:51
Different Crossplane controllers may have very different load profile and number of managed resources.
Consider examples in crossplane-contrib/provider-aws:

- Recomciling route53 records is fast, but having many resources of this type may trigger AWS API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I my head, polling in providers serve two purposes:

  • gradually getting a resource from non-existent, to creating and finally ready/synced, and communicating this state and connection parameters to a xr. This is latency sensitive, there's often a human in the loop waiting for this.
  • drift detection and observing and state changes

I think such a change here should take this into consideration. I believe I have read something about it but I'm not on top of all the current activities these days.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purposes are correct. What do you think is missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea is that we add both pollInterval (for resources not ready) and driftInterval (for other resources). But there may be other alternatives.

At least this proposal as it is would allow us to add this interval later without any big change here.

But another question is if you would need per-controller intervals if you have pollInterval and driftInterval as options? Do we need both features, or just one? And if we just have one, which would serve most users/use cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's decided to add a driftInterval, I think it makes sense to have it as a separate option. There can be different scenarios and people may want to response to drift faster than others.

Signed-off-by: max-melentyev <119438383+max-melentyev@users.noreply.github.com>
### Providing Configuration

Controller specific options can be provided with `--controller-option {controller-id}.{option}={value}`,
for example `--controller-option ec2.instance.pollInterval=30m`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative could be more fine-grained controls via annotations on MR metadata, which would override the provider default. Per controller config you're proposing here is something in the middle; even though they have different purposes/use cases, not sure if we need both of these configuration knobs 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, and it may require less changes in providers.
Though it makes resource owner responsible for configuring poll interval in the cases when it shouldn't care of it. For example, in provider-aws we implemented a monitor of AWS event that triggers reconciliation when resource is updated in AWS, and this allows us to increase poll interval for supported resources. Having this configuration as an annotation, will require updating all existing resources, and any CRD controllers that create these resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we'd have something like spec.pollPolicy (similar to spec.managementPolicies). I could imagine a few options:

  • Whatever the provider default is
  • A specific interval (30s, 2h, etc)
  • Don't poll at all? This would basically disable drift correction - MRs would only be reconciled if they changed.

@max-melentyev
Copy link
Author

I've created a PR with a preview of the feature that relies on options overrides: crossplane-contrib/provider-aws#2048

Comment on lines +19 to +20
Having a feature that makes it possible to override default poll interval for a specific controller
would allow to avoid these issues without affecting other responsiveness of controllers in the provider.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's anything else we'd want to override, apart from poll interval.

I'm thinking things like configuring exponential backoff for retries, or number of worker goroutines for a particular controller.

Interestingly poll interval feels like it could be configured on a per-MR basis, whereas the others are more a per-MR-type / per controller basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants