Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distributed provisioning: avoid errors from lib about foreign nodes #670

Closed
wants to merge 1 commit into from

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 2, 2021

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

The provisioner receives PVCs for all nodes. This is necessary to support
provisioning of PVCs with immediate binding. When it receives a PVC with a
selected node that isn't the local one, the fake node informer didn't have an
object for that node, which caused this error and a corresponding event:

  E0902 17:33:24.229890       1 controller.go:981] error syncing claim "f5008265-3c43-4385-a78d-4fa3f72fb611": failed to get target node: node "aks-workerpool-15818640-vmss00000a" not found

Which issue(s) this PR fixes:
Fixes #669

Special notes for your reviewer:

This could be avoided if the lib didn't look up the node for the
provisioner. We discussed that a while back and decided against it at the
time. Given that situation, the simplest solution is to ensure that the lib
always gets a node object.

Does this PR introduce a user-facing change?:

distributed provisioning: avoid harmless errors about "failed to get target node"

The provisioner receives PVCs for all nodes. This is necessary to support
provisioning of PVCs with immediate binding. When it receives a PVC with a
selected node that isn't the local one, the fake node informer didn't have an
object for that node, which caused this error and a corresponding event:

  E0902 17:33:24.229890       1 controller.go:981] error syncing claim "f5008265-3c43-4385-a78d-4fa3f72fb611": failed to get target node: node "aks-workerpool-15818640-vmss00000a" not found

This could be avoided if the lib didn't look up the node for the
provisioner. We discussed that a while back and decided against it at the
time. Given that situation, the simplest solution is to ensure that the lib
always gets a node object.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
To complete the pull request process, please assign jsafrane after the PR has been reviewed.
You can assign the PR to them by writing /assign @jsafrane in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 2, 2021
@pohly
Copy link
Contributor Author

pohly commented Sep 3, 2021

@wongma7, @jsafrane : what do you think?

The alternative is to change or extend the sig-storage-lib-external-provisioner API (don't look up Node there, instead pass node name). The solution from this PR seems simpler.

@jsafrane
Copy link
Contributor

jsafrane commented Sep 3, 2021

To me it looks quite fragile. sig-storage-lib-external-provisioner uses the nodeLister only after shouldProvision returns true. Can the external-provisioner filter out provisioning calls for other nodes in its ShouldProvision implementation?

@pohly
Copy link
Contributor Author

pohly commented Sep 3, 2021

Can the external-provisioner filter out provisioning calls for other nodes in its ShouldProvision implementation?

It already does that:

owned, err := p.checkNode(ctx, claim, nil, "should provision")

I think what happened here is that a claim was originally meant to be provisioned on the node (ShouldProvision returned true), but then got rescheduled. Why it then keeps trying to obtain the node and call Provision is a bit unclear - perhaps that is the real problem. Let me check that.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 3, 2021
@pohly
Copy link
Contributor Author

pohly commented Sep 6, 2021

ShouldProvision returned true

Good news, bad news... The bad news is that ShouldProvision in v2.2.x wasn't getting called at all when enabling storage capacity tracking. The good news is that this was fixed as part of #635, so the fix will be in v3.0.0.

This wasn't obvious because even at log -v5 there was no output about that check. I wonder whether that should be added.

But for now:
/close

@k8s-ci-robot
Copy link
Contributor

@pohly: Closed this PR.

In response to this:

ShouldProvision returned true

Good news, bad news... The bad news is that ShouldProvision in v2.2.x wasn't getting called at all when enabling storage capacity tracking. The dood news is that this was fixed as part of #635, so the fix will be in v3.0.0.

This wasn't obvious because even at log -v5 there was no output about that check. I wonder whether that should be added.

But for now:
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

distributed provisioning: PVC for other node not ignored silently
3 participants