-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always clean up FailedToScheduleReplica with wrong HardNodeAffinity #2792
Always clean up FailedToScheduleReplica with wrong HardNodeAffinity #2792
Conversation
Test stepsFollow the reproduce steps in longhorn/longhorn#8522 (comment).
|
Regression testing in: https://ci.longhorn.io/job/private/job/longhorn-tests-regression/6924/. Results show five failures:
Investigating:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question.
5c5e165
to
ec61947
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Longhorn 8522 Signed-off-by: Eric Weber <eric.weber@suse.com>
…reconcile Longhorn 8522 Signed-off-by: Eric Weber <eric.weber@suse.com>
ec61947
to
0ca5eb6
Compare
@mergify backport v1.5.x |
✅ Backports have been created
|
@mergify backport v1.6.x |
✅ Backports have been created
|
Which issue(s) this PR fixes:
longhorn/longhorn#8522
What this PR does / why we need it:
Delete a failed to schedule replica with HardNodeAffinity if DataLocality is disabled.
Previously, we would only do this if there were enough healthy replicas, but it led to the "deadlock" in longhorn/longhorn#8522 where we would not schedule more replicas until we deleted the failed to schedule one, but we would not delete the failed to schedule one until there were enough healthy ones.
Special notes for your reviewer:
I experimented with changing the behavior of this function even more, but there were some weird side effects. I think it is better to keep it working almost exactly as it did before. I did, however, rearrange a bit.