Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSIStorageCapacity: Topology segment not updated #847

Open
samuelluohaoen1 opened this issue Dec 28, 2022 · 17 comments
Open

CSIStorageCapacity: Topology segment not updated #847

samuelluohaoen1 opened this issue Dec 28, 2022 · 17 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@samuelluohaoen1
Copy link

samuelluohaoen1 commented Dec 28, 2022

What happened:
After new node plugins join the cluster and report new AccessibleTopologies.Segments, the current segment information is not getting updated. New CSIStorageCapacity objects are not being created.

What you expected to happen:
New node plugins reporting new values for existing topology segments should in a sense "expand" the value sets of existing topology segments. Which in turn should result in CSIStorageCapacity objects being created for new accessible segments.

How to reproduce it:

  1. Suppose the CSIDriver has name com.foo.bar. Check that STORAGECAPACITY is true.
  2. Deploy controller plugin but not node plugin. Wait for external-provisioner to print "Initial number of topology segments 0, storage classes 0, potential CSIStorageCapacity objects 0" (To see this log run external-provisioner with log level 5).
  3. Now CSINode should have DRIVERS: 0.
  4. Deploy the node plugin. Wait for the NodeGetInfo RPC to be called. The RPC should return something like
{
    "NodeId": "some-node",
    "AccessibleTopologies": {
        "Segments": [
            "kubernetes.io/hostname": "some-node"
        ]
    }
}
  1. Now CSINode should have DRIVERS: 1 which is named com.foo.bar with Node ID: some-node and Topology Keys: [kubernetes.io/hostname].
  2. Deploy a StorageClass with volumeBindingMode: WaitForFirstConsumer and provisioner: com.foo.bar.
  3. No new CSIStorageCapacity object is created.

Anything else we need to know?:
I am using the "kubernetes.io/hostname" label as the only key because we want topology to be constraint by each node. Each PV is to be provisioned locally on some node. I also assumed that "kubernetes.io/hostname" is unique across the nodes and should by default exist on every node (I hope this is a reasonable assumption).

Environment:

  • Driver version: v3.0.0
  • Kubernetes version (use kubectl version): 1.25+
  • OS (e.g. from /etc/os-release): Our in-house OS which is very similar to CentOS
  • Kernel (e.g. uname -a): Linux 4.18.0
  • Install tools: kubeadm
  • Others:

@pohly

@pohly
Copy link
Contributor

pohly commented Jan 2, 2023

No new CSIStorageCapacity object is created.

How do you check for this? With kubectl get csistoragecapacities or kubectl get --all-namespaces csistoragecapacities?

CSIStorageCapacity objects are namespaced, so the second command has to be used.

I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.

@pohly
Copy link
Contributor

pohly commented Jan 2, 2023

My commands:

/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity

@pohly
Copy link
Contributor

pohly commented Jan 2, 2023

csi-provisioner:v3.3.0

@samuelluohaoen1
Copy link
Author

No new CSIStorageCapacity object is created.

How do you check for this? With kubectl get csistoragecapacities or kubectl get --all-namespaces csistoragecapacities?

CSIStorageCapacity objects are namespaced, so the second command has to be used.

I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.

Yes it is indeed namespaced. My kubectl has the default namespace set to the namespace where the CSI plugins are deployed.

@samuelluohaoen1
Copy link
Author

My commands:

/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity

From the sequence of your commands I do not see how the controller plugin is deployment before the node plugins. I think the order of the deployment may be crucial to reproducing this issue. Could you make sure that step 2 happens before node plugins are deployed? Thank you for your trouble.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2023
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor

pohly commented Jun 14, 2023

/reopen
/assign

@k8s-ci-robot
Copy link
Contributor

@pohly: Reopened this issue.

In response to this:

/reopen
/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jun 14, 2023
pohly added a commit to pohly/external-provisioner that referenced this issue Jun 15, 2023
kubernetes-csi#847 mentions a
problem that occurs when the central controller runs before the node
plugins. This should be handled through updating node segments and the code
exists, it just lacked some unit tests. Those don't trigger the problem
mentioned in the issue, but it's worthwhile to add them nonetheless.
@pohly
Copy link
Contributor

pohly commented Jun 15, 2023

@samuelluohaoen1: it looks like you are using a central controller for your CSI driver. Is that correct?

Can you perhaps share the external-provisioner log at level >= 5? The is code which should react to changes in the node and CSIDriver objects when the node plugin gets registered after the controller has started.

We don't have a CSI driver deployment readily available to test this scenario. I tried reproducing it through unit tests (see #942) but the code worked as expected.

@xing-yang
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024
@yuxiang-he
Copy link

yuxiang-he commented Feb 22, 2024

@pohly We observed something similar but the CSIStorageCapacity objects were created after about an hour.

I believe there is currently an issue where the capacity controller is tracking duplicated workqueue entries. See issue #1161

@yuxiang-he
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 22, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants