Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle processing of a sample featuregate #408

Merged
merged 1 commit into from
May 21, 2024
Merged

Conversation

bpradipt
Copy link
Contributor

@bpradipt bpradipt commented May 16, 2024

Few key things to note

  • Use a single sample feature gate.
  • Move the feature gate as a string constant.
  • The feature gates are handled within the KataConfig reconcile loop.
  • If there are any errors with feature gate processing, the reconciliation process continues and doesn't re-queue
  • Feature gate processing, in case of any errors in determining whether it is enabled or not (for example if the configMap is deleted or some other API server errors), is same as the default compiled in state of the respective feature gate

Related to #KATA-2947

@bpradipt bpradipt requested a review from pmores May 16, 2024 12:57
@openshift-ci openshift-ci bot requested review from cpmeadors and jensfr May 16, 2024 12:59
@pmores
Copy link
Contributor

pmores commented May 16, 2024

@bpradipt This PR looks good to me as it stands and I believe I'll be able to approve shortly.

I'd like to ask though if you have any plans to deal with the FeatureGates structure (since this PR relies on it implicitly). To summarise and expand what I said on PR #394: the structure looks superfluous to me frankly. Looking at its members, ConfigMapName is a compile-time constant so it can live in the binary's static data, no need to store it dynamically. Namespace can be treated either as a compile-time constant as well, or we could go slightly more dynamic and derive it from the KataConfig instance namespace - in either case I can see little to no need to store it in a separate structure. Finally, Client is already stored in KataConfigOpenShiftReconciler.

Also, its IsEnabled() member fetches the corresponding configmap on each query anew. Is that necessary? I think it would be preferable to fetch the configmap once per reconciliation somewhere near the top of Reconcile() then pass it to feature gate processing. It could be stored in KataConfigOpenShiftReconciler for convenience if need be (and provided it's re-fetched on each reconcile). Then IsEnabled() could just fetch a value from the cm if present or replace it with a default it not.

@bpradipt
Copy link
Contributor Author

@bpradipt This PR looks good to me as it stands and I believe I'll be able to approve shortly.

I'd like to ask though if you have any plans to deal with the FeatureGates structure (since this PR relies on it implicitly). To summarise and expand what I said on PR #394: the structure looks superfluous to me frankly. Looking at its members, ConfigMapName is a compile-time constant so it can live in the binary's static data, no need to store it dynamically. Namespace can be treated either as a compile-time constant as well, or we could go slightly more dynamic and derive it from the KataConfig instance namespace - in either case I can see little to no need to store it in a separate structure. Finally, Client is already stored in KataConfigOpenShiftReconciler.

Valid point. I'll rethink about the FG struct. And even if we need this struct, we might not need it to be part of KataConfigOpenShiftReconciler struct.

Also, its IsEnabled() member fetches the corresponding configmap on each query anew. Is that necessary? I think it would be preferable to fetch the configmap once per reconciliation somewhere near the top of Reconcile() then pass it to feature gate processing. It could be stored in KataConfigOpenShiftReconciler for convenience if need be (and provided it's re-fetched on each reconcile). Then IsEnabled() could just fetch a value from the cm if present or replace it with a default it not.

While reading more about controller-runtime I found that it already uses caching. This was unclear to me earlier and I preferred retrieving the configMap once and using it for processing. With this new info I'm not sure and hence left the existing code as-is. What do you recommend ?

@bpradipt
Copy link
Contributor Author

@bpradipt This PR looks good to me as it stands and I believe I'll be able to approve shortly.
I'd like to ask though if you have any plans to deal with the FeatureGates structure (since this PR relies on it implicitly). To summarise and expand what I said on PR #394: the structure looks superfluous to me frankly. Looking at its members, ConfigMapName is a compile-time constant so it can live in the binary's static data, no need to store it dynamically. Namespace can be treated either as a compile-time constant as well, or we could go slightly more dynamic and derive it from the KataConfig instance namespace - in either case I can see little to no need to store it in a separate structure. Finally, Client is already stored in KataConfigOpenShiftReconciler.

Valid point. I'll rethink about the FG struct. And even if we need this struct, we might not need it to be part of KataConfigOpenShiftReconciler struct.

Also, its IsEnabled() member fetches the corresponding configmap on each query anew. Is that necessary? I think it would be preferable to fetch the configmap once per reconciliation somewhere near the top of Reconcile() then pass it to feature gate processing. It could be stored in KataConfigOpenShiftReconciler for convenience if need be (and provided it's re-fetched on each reconcile). Then IsEnabled() could just fetch a value from the cm if present or replace it with a default it not.

While reading more about controller-runtime I found that it already uses caching. This was unclear to me earlier and I preferred retrieving the configMap once and using it for processing. With this new info I'm not sure and hence left the existing code as-is. What do you recommend ?

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation.
Please check and let me know if this is what you were looking at ?

@bpradipt
Copy link
Contributor Author

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation. Please check and let me know if this is what you were looking at ?

@pmores would it be better to have the FG status struct as part of the KataConfigOpenShiftReconciler struct itself in case there is a need for it's use in later part of the reconcile method ?

controllers/fg_handler.go Outdated Show resolved Hide resolved
controllers/fg_handler.go Show resolved Hide resolved
@pmores
Copy link
Contributor

pmores commented May 21, 2024

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation. Please check and let me know if this is what you were looking at ?

@pmores would it be better to have the FG status struct as part of the KataConfigOpenShiftReconciler struct itself in case there is a need for it's use in later part of the reconcile method ?

Yeah, that occured to me as well. :-) As long as we can keep gated feature processing limited to processFeatureGates() we don't have to put the FG status into KataConfigOpenShiftReconciler. However my hunch is that sooner or later (actually rather sooner than later :-)) there will be a feature that needs to affect the control flow of the main controller algorithms (installation, uninstallation), and then having the FG status in the reconciler struct will come handy. So I think it's fine to put it there - as long as we take care to load it at the beginning of each reconciliation - and we can do it now or we can wait until it's actually needed (I'd probably slightly prefer the latter).

Copy link
Member

@beraldoleal beraldoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bpradipt, I left a few comments for you.

Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

controllers/fg_handler.go Outdated Show resolved Hide resolved
internal/featuregates/featuregates.go Outdated Show resolved Hide resolved

func IsEnabled(fgStatus *FeatureGateStatus, feature string) bool {

return fgStatus.FeatureGates[feature]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since feature is a string, and anything could be used, I'm missing here a KeyError handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beraldoleal can you please share an example..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as far as I can see this function will only ever be called with one of a limited set of compile-time string constants as a parameter - it's not like we read a string externally (e.g. from the user) and pass it to IsEnabled(). If that's the case, key existence checking doesn't seem strictly necessary, and adding it could arguably confuse a reader by suggesting a non-existent key can be passed in.

So I'd say, double check my premise above is correct and if so, make this an explicit feature of code design by not including checking (maybe add a comment to that effect). However if there is a chance that the premise doesn't hold, even a slight one, let's stay on the safe side and do the checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm unable to understand this and hence my question. The FG status is populated during every reconcile and checked by the operator code so the code has to ensure right feature is checked.
Also if a missing feature is checked it'll return the default bool value, ie false.
Check this - thttps://play.golang.com/p/X2N0sc5fNi6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, maybe I misinterpreted Beraldo's concern. @beraldoleal could you elaborate what you're worried about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's handle this case in future please..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, it'd probably leave it as is. First, gated feature identifiers are very likely to stay the way they are - compile-time constants controlled by this project. In other words, I don't see much scope for them to become somewhat arbitrary strings. Also, it could be argued that if a feature doesn't exist then it's necessarily disabled :-) so even the worst case semantics doesn't really disturb me.

Copy link
Contributor Author

@bpradipt bpradipt May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the caller (ie the operator code) needs to explicitly query to check a feature gate. So why would we willingly write a code for non-existent feature? If there are cases about using featuregate module elsewhere then it might make sense. Let's handle it when there is a requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's robust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why would we willingly write a code for non-existent feature?

First, gated feature identifiers are very likely to stay the way they are - compile-time constants controlled by this project. In other words, I don't see much scope for them to become somewhat arbitrary strings.

We can't guarantee future code, refactoring, or features won't break this assumption. Writing code based on an assumption without enforcement is risky. However, I agree we can't account for every possibility and need to find a balance. Over-engineering is bad, but lacking some checks is also dangerous.

So, for the sake of balance, let's move forward. Approved.

@bpradipt
Copy link
Contributor Author

Hi @bpradipt, I left a few comments for you.

Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

@bpradipt
Copy link
Contributor Author

Hi @bpradipt, I left a few comments for you.
Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

@beraldoleal for some context on the new changes w.r.to removal of the FG struct and reading the configMap once during entering the reconcile loop - #408 (comment)

@bpradipt
Copy link
Contributor Author

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation. Please check and let me know if this is what you were looking at ?

@pmores would it be better to have the FG status struct as part of the KataConfigOpenShiftReconciler struct itself in case there is a need for it's use in later part of the reconcile method ?

Yeah, that occured to me as well. :-) As long as we can keep gated feature processing limited to processFeatureGates() we don't have to put the FG status into KataConfigOpenShiftReconciler. However my hunch is that sooner or later (actually rather sooner than later :-)) there will be a feature that needs to affect the control flow of the main controller algorithms (installation, uninstallation), and then having the FG status in the reconciler struct will come handy. So I think it's fine to put it there - as long as we take care to load it at the beginning of each reconciliation - and we can do it now or we can wait until it's actually needed (I'd probably slightly prefer the latter).

Ok, let's do it later when it's actually needed :-)

@pmores
Copy link
Contributor

pmores commented May 21, 2024

Hi @bpradipt, I left a few comments for you.
Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

I know it's tedious but some of the preliminary discussion was not public, and I agree with @beraldoleal that at least main design decisions should ideally be publicly documented.

@bpradipt
Copy link
Contributor Author

@pmores @beraldoleal I have squashed the commits and addressed the comments.
One thing pending is the following - #408 (comment)

Please tell me what you prefer and I'll make the changes accordingly. I think it's a matter of taste and doesn't impact the logic.

@pmores
Copy link
Contributor

pmores commented May 21, 2024

@pmores @beraldoleal I have squashed the commits and addressed the comments. One thing pending is the following - #408 (comment)

Please tell me what you prefer and I'll make the changes accordingly. I think it's a matter of taste and doesn't impact the logic.

Correct me if I'm wrong but I'm under the impression that it does make a difference. If the function never returns an error then we're bound to revert gated features to defaults on any intermittent failure to communicate with the control plane, right? Wouldn't we rather want to reschedule and try again? There might be gated features whose status change can trigger heavyweight operations in the cluster.

@bpradipt
Copy link
Contributor Author

There might be gated features whose status change can trigger heavyweight operations in the cluster.

This was precisely my argument w.r.to differentiating between deleting a configMap and disabling the FGs vs explicitly disabling an FG in the configMap :-(

@pmores
Copy link
Contributor

pmores commented May 21, 2024

There might be gated features whose status change can trigger heavyweight operations in the cluster.

This was precisely my argument w.r.to differentiating between deleting a configMap and disabling the FGs vs explicitly disabling an FG in the configMap :-(

I know, but I see a significant difference between triggering a heavyweight operation due to an explicit user action (i.e. deleting the configmap) vs due to a intermittent failure to talk to the control plane. I believe we should work hard to avoid the latter as its result would be a terrible user experience - the controller all of a sudden changing the state of the cluster to an undesired one, seemingly for no reason.

@beraldoleal
Copy link
Member

Hi @bpradipt, I left a few comments for you.
Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

@beraldoleal for some context on the new changes w.r.to removal of the FG struct and reading the configMap once during entering the reconcile loop - #408 (comment)

I understand your approach and I got where you are coming from, but it's important to consider the reviewer's perspective. Reading through all comments in every thread (eventually from multiple reviewers) to understand the changes can be quite unproductive.

Commits should provide immediate context and reasoning, allowing reviewers to quickly grasp the proposed changes without needing to dig through lengthy discussions.

Also, a PR is a proposal to update the git tree, with commits reflecting the desired final state. Reviewing everything twice (code and later git log) isn't very efficient. Let's aim to keep our commit messages clear and comprehensive for smoother reviews.

Few key things to note
- Use a single sample feature gate.
- Move the feature gate as a string constant.
- The feature gates are handled within the KataConfig reconcile loop.
- If there are any errors with feature gate processing, the
  reconciliation process continues and doesn't re-queue
- Feature gate processing, in case of any errors in determining whether
  it is enabled or not (for example if the configMap is deleted or some
  other API server errors), is same as the default compiled in state of
  the respective feature gate
- The FeatureGate struct is removed. Instead there is a new
  FeatureGateStatus struct that is populated in the beginning of the
  reconcile loop with the status of the feature gates from the
  configMap. This ensures that the entire feature gate configMap is only read
  once from the API server instead of making repeated calls to the API server
  for checking individual feature gates.
- The IsEnabled method to check the status of individual feature gate is
  adapted to use the new FeatureGateStatus struct

Related to #KATA-2947

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Copy link
Contributor

@pmores pmores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @bpradipt !

Copy link
Member

@beraldoleal beraldoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@bpradipt
Copy link
Contributor Author

Thanks guys.. Merging this without delay ;)

@bpradipt bpradipt merged commit 8a0f0ae into openshift:devel May 21, 2024
2 of 4 checks passed
@bpradipt bpradipt deleted the fg-rc branch May 21, 2024 17:07
Copy link

openshift-ci bot commented May 21, 2024

@bpradipt: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/check cd2f646 link false /test check

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants