-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-26498: Optimize Upgrade Validation plugin by skipping unnecessary changes #587
OCPBUGS-26498: Optimize Upgrade Validation plugin by skipping unnecessary changes #587
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-26498, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
caddb91
to
6746855
Compare
Though this is a valid improvement, I'm closing in favor of #588 as I now don't think this PR fixes the /close |
@gcs278: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gcs278: This pull request references Jira Issue OCPBUGS-26498. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gcs278: This pull request references Jira Issue OCPBUGS-26498, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
6746855
to
b3fcf1e
Compare
b3fcf1e
to
73ab620
Compare
pkg/router/controller/status.go
Outdated
// We only do this in for the new UnservableInFutureVersions condition to avoid perturbing existing logic for the | ||
// Admitted condition. This should be reevaluated when refactoring to use a single route status update per route | ||
// reconciliation (see TODO at top of file). | ||
changed, _, _, _, _ := recordIngressCondition(route.DeepCopy(), a.routerName, a.routerCanonicalHostname, expectedCondition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to note that route.DeepCopy()
is necessary because recordIngressCondition
actually modifies route
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider refactoring to use findCondition
instead of recordIngressCondition
to save some overhead from deepcopies etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
73ab620
to
62ebee7
Compare
Putting a hold to wait for testing confirmation. |
pkg/router/controller/status.go
Outdated
changed := existing.Host != route.Spec.Host || | ||
existing.WildcardPolicy != route.Spec.WildcardPolicy || | ||
existing.RouterCanonicalHostname != hostName | ||
if changed { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this and the unit test makes me realize that both the StatusAdmitter plugin and the UpgradeValidation plugin update the status ingress entry in case one of these fields (Host
, WildcardPolicy
, or RouterCanonicalHostname
) needs to be updated.
Suppose the route has a host name that conflicts with another route's, and then the route is updated to have a non-conflicting host name. UpgradeValidation runs first in the plugin chain, so it updates the status ingress entry first, and the StatusAdmitter plugin updates the status condition second. Do I have that right?
Would it be safer for RecordRouteUnservableInFutureVersions
to ignore changes to the status ingress entry other than the UnservableInFutureVersions status condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the route has a host name that conflicts with another route's, and then the route is updated to have a non-conflicting host name. UpgradeValidation runs first in the plugin chain, so it updates the status ingress entry first, and the StatusAdmitter plugin updates the status condition second. Do I have that right?
Correct. Both UpgradeValidation and StatusAdmitter try to update the new host name as part of the ingress fields.
Would it be safer for RecordRouteUnservableInFutureVersions to ignore changes to the status ingress entry other than the UnservableInFutureVersions status condition?
Hmmm. Yea it might be slightly safer. If the route is also going from Rejected to Admitted, you'll have this sequence:
- Route is updated to fix hostname
- UpgradeValidation updates status to fix hostname
- StatusAdmitter updates status from Admitted=False to Admitted=True
If we limit UpgradeValidation to just changes in UnservableInFutureVersions, we should only get one status update:
- Route is updated to fix hostname
- UpgradeValidation does nothing since it only cares about UnservableInFutureVersions
- StatusAdmitter updates status to fix hostname and sets Admitted=False to Admitted=True
So yes, I agree. I can update it so we only care about conditions changes. That way, if it's just a hostname change, the UpgradeValidation plugin will let StatusAdmitter take care of that.
a75561c
to
7c01e09
Compare
Both performIngressConditionUpdate and performIngressConditionRemoval functions add tasks to the writerlease queue even if no work needed to be done. This commit optimizes the Upgrade Validation plugin by ensuring that tasks for updating UnservableInFutureVersions are only added to the queue when changes are required. This reduction in unnecessary work significantly lowers the frequency of lease extensions. Previously, excessive lease extensions could delay route status updates under certain conditions, such as during temporary contention periods where a router pod gets demoted to a follower. After the contention is resolved, the pod’s subsequent retry will be delayed more than necessary. Additionally, this update provides a clearer indication of when updates are actually required, but have been completed by another router pod. The prior logic did not clearly distinguish between unnecessary updates and updates that were completed by another pod. The selective updates are only added to the interface used by Upgrade Validation plugin to avoid perturbing existing Admitted condition logic. This means nominal clusters without SHA1 routes should not be impacted by the Upgrade Validation plugin, minimizing the risk associated with its introduction.
7c01e09
to
13ab1dd
Compare
Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@gcs278: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I've ran nearly a hundred times with my e2e test in openshift/origin#28710 and no issues. Reviewed the router logs and lease extends are drastically reduced. |
@gcs278: Jira Issue OCPBUGS-26498: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-26498 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-haproxy-router-base-container-v4.16.0-202405020546.p0.g5610ac8.assembly.stream.el9 for distgit ose-haproxy-router-base. |
Both performIngressConditionUpdate and performIngressConditionRemoval functions add tasks to the writerlease queue even if no work needed to be done. This commit optimizes the Upgrade Validation plugin by ensuring that tasks for updating UnservableInFutureVersions are only added to the queue when changes are required.
This reduction in unnecessary work significantly lowers the frequency of lease extensions. Previously, excessive lease extensions could delay route status updates under certain conditions, such as during temporary contention periods
where a router pod gets demoted to a follower. After the contention is resolved, the pod’s subsequent retry will be delayed more than necessary.
Additionally, this update provides a clearer indication of when updates are actually required, but have been completed by another router pod. The prior logic did not clearly distinguish between unnecessary updates and updates that were completed by another pod.
The selective updates are only added to the interface used by Upgrade Validation plugin to avoid perturbing existing Admitted condition logic. This means nominal clusters without SHA1 routes should not be impacted by the Upgrade
Validation plugin, minimizing the risk associated with its introduction.
Fixes TestRouteAdmissionPolicy Flake by reducing the number of unnecessary lease extensions.