Handle processing of a sample featuregate #408

bpradipt · 2024-05-16T12:54:58Z

Few key things to note

Use a single sample feature gate.
Move the feature gate as a string constant.
The feature gates are handled within the KataConfig reconcile loop.
If there are any errors with feature gate processing, the reconciliation process continues and doesn't re-queue
Feature gate processing, in case of any errors in determining whether it is enabled or not (for example if the configMap is deleted or some other API server errors), is same as the default compiled in state of the respective feature gate

internal/featuregates/featuregates.go

controllers/fg_handler.go

pmores · 2024-05-16T13:45:56Z

@bpradipt This PR looks good to me as it stands and I believe I'll be able to approve shortly.

I'd like to ask though if you have any plans to deal with the FeatureGates structure (since this PR relies on it implicitly). To summarise and expand what I said on PR #394: the structure looks superfluous to me frankly. Looking at its members, ConfigMapName is a compile-time constant so it can live in the binary's static data, no need to store it dynamically. Namespace can be treated either as a compile-time constant as well, or we could go slightly more dynamic and derive it from the KataConfig instance namespace - in either case I can see little to no need to store it in a separate structure. Finally, Client is already stored in KataConfigOpenShiftReconciler.

Also, its IsEnabled() member fetches the corresponding configmap on each query anew. Is that necessary? I think it would be preferable to fetch the configmap once per reconciliation somewhere near the top of Reconcile() then pass it to feature gate processing. It could be stored in KataConfigOpenShiftReconciler for convenience if need be (and provided it's re-fetched on each reconcile). Then IsEnabled() could just fetch a value from the cm if present or replace it with a default it not.

bpradipt · 2024-05-16T15:09:09Z

@bpradipt This PR looks good to me as it stands and I believe I'll be able to approve shortly.

I'd like to ask though if you have any plans to deal with the FeatureGates structure (since this PR relies on it implicitly). To summarise and expand what I said on PR #394: the structure looks superfluous to me frankly. Looking at its members, ConfigMapName is a compile-time constant so it can live in the binary's static data, no need to store it dynamically. Namespace can be treated either as a compile-time constant as well, or we could go slightly more dynamic and derive it from the KataConfig instance namespace - in either case I can see little to no need to store it in a separate structure. Finally, Client is already stored in KataConfigOpenShiftReconciler.

Valid point. I'll rethink about the FG struct. And even if we need this struct, we might not need it to be part of KataConfigOpenShiftReconciler struct.

Also, its IsEnabled() member fetches the corresponding configmap on each query anew. Is that necessary? I think it would be preferable to fetch the configmap once per reconciliation somewhere near the top of Reconcile() then pass it to feature gate processing. It could be stored in KataConfigOpenShiftReconciler for convenience if need be (and provided it's re-fetched on each reconcile). Then IsEnabled() could just fetch a value from the cm if present or replace it with a default it not.

While reading more about controller-runtime I found that it already uses caching. This was unclear to me earlier and I preferred retrieving the configMap once and using it for processing. With this new info I'm not sure and hence left the existing code as-is. What do you recommend ?

bpradipt · 2024-05-20T10:59:30Z

@bpradipt This PR looks good to me as it stands and I believe I'll be able to approve shortly.
I'd like to ask though if you have any plans to deal with the FeatureGates structure (since this PR relies on it implicitly). To summarise and expand what I said on PR #394: the structure looks superfluous to me frankly. Looking at its members, ConfigMapName is a compile-time constant so it can live in the binary's static data, no need to store it dynamically. Namespace can be treated either as a compile-time constant as well, or we could go slightly more dynamic and derive it from the KataConfig instance namespace - in either case I can see little to no need to store it in a separate structure. Finally, Client is already stored in KataConfigOpenShiftReconciler.

Valid point. I'll rethink about the FG struct. And even if we need this struct, we might not need it to be part of KataConfigOpenShiftReconciler struct.

Also, its IsEnabled() member fetches the corresponding configmap on each query anew. Is that necessary? I think it would be preferable to fetch the configmap once per reconciliation somewhere near the top of Reconcile() then pass it to feature gate processing. It could be stored in KataConfigOpenShiftReconciler for convenience if need be (and provided it's re-fetched on each reconcile). Then IsEnabled() could just fetch a value from the cm if present or replace it with a default it not.

While reading more about controller-runtime I found that it already uses caching. This was unclear to me earlier and I preferred retrieving the configMap once and using it for processing. With this new info I'm not sure and hence left the existing code as-is. What do you recommend ?

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation.
Please check and let me know if this is what you were looking at ?

bpradipt · 2024-05-21T04:55:52Z

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation. Please check and let me know if this is what you were looking at ?

@pmores would it be better to have the FG status struct as part of the KataConfigOpenShiftReconciler struct itself in case there is a need for it's use in later part of the reconcile method ?

controllers/fg_handler.go

internal/featuregates/featuregates.go

pmores · 2024-05-21T13:37:14Z

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation. Please check and let me know if this is what you were looking at ?

@pmores would it be better to have the FG status struct as part of the KataConfigOpenShiftReconciler struct itself in case there is a need for it's use in later part of the reconcile method ?

Yeah, that occured to me as well. :-) As long as we can keep gated feature processing limited to processFeatureGates() we don't have to put the FG status into KataConfigOpenShiftReconciler. However my hunch is that sooner or later (actually rather sooner than later :-)) there will be a feature that needs to affect the control flow of the main controller algorithms (installation, uninstallation), and then having the FG status in the reconciler struct will come handy. So I think it's fine to put it there - as long as we take care to load it at the beginning of each reconciliation - and we can do it now or we can wait until it's actually needed (I'd probably slightly prefer the latter).

beraldoleal

Hi @bpradipt, I left a few comments for you.

Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

controllers/fg_handler.go

internal/featuregates/featuregates.go

beraldoleal · 2024-05-21T13:46:56Z

internal/featuregates/featuregates.go

+
+func IsEnabled(fgStatus *FeatureGateStatus, feature string) bool {
+
+	return fgStatus.FeatureGates[feature]


Since feature is a string, and anything could be used, I'm missing here a KeyError handler.

@beraldoleal can you please share an example..

Well, as far as I can see this function will only ever be called with one of a limited set of compile-time string constants as a parameter - it's not like we read a string externally (e.g. from the user) and pass it to IsEnabled(). If that's the case, key existence checking doesn't seem strictly necessary, and adding it could arguably confuse a reader by suggesting a non-existent key can be passed in.

So I'd say, double check my premise above is correct and if so, make this an explicit feature of code design by not including checking (maybe add a comment to that effect). However if there is a chance that the premise doesn't hold, even a slight one, let's stay on the safe side and do the checking.

Sorry, I'm unable to understand this and hence my question. The FG status is populated during every reconcile and checked by the operator code so the code has to ensure right feature is checked.
Also if a missing feature is checked it'll return the default bool value, ie false.
Check this - thttps://play.golang.com/p/X2N0sc5fNi6

Right, maybe I misinterpreted Beraldo's concern. @beraldoleal could you elaborate what you're worried about?

Let's handle this case in future please..

Frankly, it'd probably leave it as is. First, gated feature identifiers are very likely to stay the way they are - compile-time constants controlled by this project. In other words, I don't see much scope for them to become somewhat arbitrary strings. Also, it could be argued that if a feature doesn't exist then it's necessarily disabled :-) so even the worst case semantics doesn't really disturb me.

Note that the caller (ie the operator code) needs to explicitly query to check a feature gate. So why would we willingly write a code for non-existent feature? If there are cases about using featuregate module elsewhere then it might make sense. Let's handle it when there is a requirement.

Yes, I think it's robust.

So why would we willingly write a code for non-existent feature?

First, gated feature identifiers are very likely to stay the way they are - compile-time constants controlled by this project. In other words, I don't see much scope for them to become somewhat arbitrary strings.

We can't guarantee future code, refactoring, or features won't break this assumption. Writing code based on an assumption without enforcement is risky. However, I agree we can't account for every possibility and need to find a balance. Over-engineering is bad, but lacking some checks is also dangerous.

So, for the sake of balance, let's move forward. Approved.

bpradipt · 2024-05-21T14:24:16Z

Hi @bpradipt, I left a few comments for you.

Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

bpradipt · 2024-05-21T14:29:29Z

Hi @bpradipt, I left a few comments for you.
Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

@beraldoleal for some context on the new changes w.r.to removal of the FG struct and reading the configMap once during entering the reconcile loop - #408 (comment)

bpradipt · 2024-05-21T14:30:39Z

@pmores I have added new commits to remove the FG struct and read the configmap during reconciliation. Please check and let me know if this is what you were looking at ?

@pmores would it be better to have the FG status struct as part of the KataConfigOpenShiftReconciler struct itself in case there is a need for it's use in later part of the reconcile method ?

Yeah, that occured to me as well. :-) As long as we can keep gated feature processing limited to processFeatureGates() we don't have to put the FG status into KataConfigOpenShiftReconciler. However my hunch is that sooner or later (actually rather sooner than later :-)) there will be a feature that needs to affect the control flow of the main controller algorithms (installation, uninstallation), and then having the FG status in the reconciler struct will come handy. So I think it's fine to put it there - as long as we take care to load it at the beginning of each reconciliation - and we can do it now or we can wait until it's actually needed (I'd probably slightly prefer the latter).

Ok, let's do it later when it's actually needed :-)

pmores · 2024-05-21T14:40:48Z

Hi @bpradipt, I left a few comments for you.
Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

I know it's tedious but some of the preliminary discussion was not public, and I agree with @beraldoleal that at least main design decisions should ideally be publicly documented.

bpradipt · 2024-05-21T15:01:43Z

@pmores @beraldoleal I have squashed the commits and addressed the comments.
One thing pending is the following - #408 (comment)

Please tell me what you prefer and I'll make the changes accordingly. I think it's a matter of taste and doesn't impact the logic.

pmores · 2024-05-21T15:07:20Z

@pmores @beraldoleal I have squashed the commits and addressed the comments. One thing pending is the following - #408 (comment)

Please tell me what you prefer and I'll make the changes accordingly. I think it's a matter of taste and doesn't impact the logic.

Correct me if I'm wrong but I'm under the impression that it does make a difference. If the function never returns an error then we're bound to revert gated features to defaults on any intermittent failure to communicate with the control plane, right? Wouldn't we rather want to reschedule and try again? There might be gated features whose status change can trigger heavyweight operations in the cluster.

bpradipt · 2024-05-21T15:11:58Z

There might be gated features whose status change can trigger heavyweight operations in the cluster.

This was precisely my argument w.r.to differentiating between deleting a configMap and disabling the FGs vs explicitly disabling an FG in the configMap :-(

pmores · 2024-05-21T15:18:49Z

There might be gated features whose status change can trigger heavyweight operations in the cluster.

This was precisely my argument w.r.to differentiating between deleting a configMap and disabling the FGs vs explicitly disabling an FG in the configMap :-(

I know, but I see a significant difference between triggering a heavyweight operation due to an explicit user action (i.e. deleting the configmap) vs due to a intermittent failure to talk to the control plane. I believe we should work hard to avoid the latter as its result would be a terrible user experience - the controller all of a sudden changing the state of the cluster to an undesired one, seemingly for no reason.

beraldoleal · 2024-05-21T16:05:22Z

Hi @bpradipt, I left a few comments for you.
Also, it would be much appreciated if the commit messages was explaining why you are proposing the changes instead of listing what is doing. This would help reviewers to think what you have in mind with those changes. Do you mind expanding the reasoning on the commit message, please?

I'll add it. But specifically for this PR there is a long history of discussions. I would also suggest to take a look at them for the context.

@beraldoleal for some context on the new changes w.r.to removal of the FG struct and reading the configMap once during entering the reconcile loop - #408 (comment)

I understand your approach and I got where you are coming from, but it's important to consider the reviewer's perspective. Reading through all comments in every thread (eventually from multiple reviewers) to understand the changes can be quite unproductive.

Commits should provide immediate context and reasoning, allowing reviewers to quickly grasp the proposed changes without needing to dig through lengthy discussions.

Also, a PR is a proposal to update the git tree, with commits reflecting the desired final state. Reviewing everything twice (code and later git log) isn't very efficient. Let's aim to keep our commit messages clear and comprehensive for smoother reviews.

Few key things to note - Use a single sample feature gate. - Move the feature gate as a string constant. - The feature gates are handled within the KataConfig reconcile loop. - If there are any errors with feature gate processing, the reconciliation process continues and doesn't re-queue - Feature gate processing, in case of any errors in determining whether it is enabled or not (for example if the configMap is deleted or some other API server errors), is same as the default compiled in state of the respective feature gate - The FeatureGate struct is removed. Instead there is a new FeatureGateStatus struct that is populated in the beginning of the reconcile loop with the status of the feature gates from the configMap. This ensures that the entire feature gate configMap is only read once from the API server instead of making repeated calls to the API server for checking individual feature gates. - The IsEnabled method to check the status of individual feature gate is adapted to use the new FeatureGateStatus struct Related to #KATA-2947 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>

pmores

lgtm, thanks @bpradipt !

beraldoleal

Thanks, LGTM.

bpradipt · 2024-05-21T17:07:19Z

Thanks guys.. Merging this without delay ;)

openshift-ci · 2024-05-21T17:07:50Z

@bpradipt: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/check	`cd2f646`	link	false	`/test check`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

bpradipt requested a review from pmores May 16, 2024 12:57

openshift-ci bot requested review from cpmeadors and jensfr May 16, 2024 12:59

beraldoleal reviewed May 16, 2024

View reviewed changes

internal/featuregates/featuregates.go Show resolved Hide resolved

controllers/fg_handler.go Show resolved Hide resolved

beraldoleal reviewed May 21, 2024

View reviewed changes

controllers/fg_handler.go Outdated Show resolved Hide resolved

controllers/fg_handler.go Show resolved Hide resolved

pmores reviewed May 21, 2024

View reviewed changes

internal/featuregates/featuregates.go Show resolved Hide resolved

beraldoleal reviewed May 21, 2024

View reviewed changes

bpradipt force-pushed the fg-rc branch from c7353ba to 000cb47 Compare May 21, 2024 14:59

bpradipt force-pushed the fg-rc branch from 000cb47 to af16755 Compare May 21, 2024 16:42

bpradipt force-pushed the fg-rc branch from af16755 to cd2f646 Compare May 21, 2024 16:45

bpradipt requested review from pmores and beraldoleal May 21, 2024 16:47

pmores approved these changes May 21, 2024

View reviewed changes

beraldoleal approved these changes May 21, 2024

View reviewed changes

bpradipt merged commit 8a0f0ae into openshift:devel May 21, 2024
2 of 4 checks passed

bpradipt deleted the fg-rc branch May 21, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle processing of a sample featuregate #408

Handle processing of a sample featuregate #408

bpradipt commented May 16, 2024 •

edited by openshift-ci bot

pmores commented May 16, 2024

bpradipt commented May 16, 2024

bpradipt commented May 20, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

beraldoleal left a comment

beraldoleal May 21, 2024

bpradipt May 21, 2024

pmores May 21, 2024

bpradipt May 21, 2024

pmores May 21, 2024

bpradipt May 21, 2024

pmores May 21, 2024

bpradipt May 21, 2024 •

edited

pmores May 21, 2024

beraldoleal May 21, 2024

bpradipt commented May 21, 2024

bpradipt commented May 21, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

beraldoleal commented May 21, 2024

pmores left a comment

beraldoleal left a comment

bpradipt commented May 21, 2024

openshift-ci bot commented May 21, 2024


		func IsEnabled(fgStatus *FeatureGateStatus, feature string) bool {

		return fgStatus.FeatureGates[feature]

Handle processing of a sample featuregate #408

Handle processing of a sample featuregate #408

Conversation

bpradipt commented May 16, 2024 • edited by openshift-ci bot

pmores commented May 16, 2024

bpradipt commented May 16, 2024

bpradipt commented May 20, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

beraldoleal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpradipt May 21, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpradipt commented May 21, 2024

bpradipt commented May 21, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

bpradipt commented May 21, 2024

pmores commented May 21, 2024

beraldoleal commented May 21, 2024

pmores left a comment

Choose a reason for hiding this comment

beraldoleal left a comment

Choose a reason for hiding this comment

bpradipt commented May 21, 2024

openshift-ci bot commented May 21, 2024

bpradipt commented May 16, 2024 •

edited by openshift-ci bot

bpradipt May 21, 2024 •

edited