CSIStorageCapacity: Topology segment not updated #847

samuelluohaoen1 · 2022-12-28T01:44:31Z

What happened:
After new node plugins join the cluster and report new AccessibleTopologies.Segments, the current segment information is not getting updated. New CSIStorageCapacity objects are not being created.

What you expected to happen:
New node plugins reporting new values for existing topology segments should in a sense "expand" the value sets of existing topology segments. Which in turn should result in CSIStorageCapacity objects being created for new accessible segments.

How to reproduce it:

Suppose the CSIDriver has name com.foo.bar. Check that STORAGECAPACITY is true.
Deploy controller plugin but not node plugin. Wait for external-provisioner to print "Initial number of topology segments 0, storage classes 0, potential CSIStorageCapacity objects 0" (To see this log run external-provisioner with log level 5).
Now CSINode should have DRIVERS: 0.
Deploy the node plugin. Wait for the NodeGetInfo RPC to be called. The RPC should return something like

{
    "NodeId": "some-node",
    "AccessibleTopologies": {
        "Segments": [
            "kubernetes.io/hostname": "some-node"
        ]
    }
}

Now CSINode should have DRIVERS: 1 which is named com.foo.bar with Node ID: some-node and Topology Keys: [kubernetes.io/hostname].
Deploy a StorageClass with volumeBindingMode: WaitForFirstConsumer and provisioner: com.foo.bar.
No new CSIStorageCapacity object is created.

Anything else we need to know?:
I am using the "kubernetes.io/hostname" label as the only key because we want topology to be constraint by each node. Each PV is to be provisioned locally on some node. I also assumed that "kubernetes.io/hostname" is unique across the nodes and should by default exist on every node (I hope this is a reasonable assumption).

Environment:

Driver version: v3.0.0
Kubernetes version (use kubectl version): 1.25+
OS (e.g. from /etc/os-release): Our in-house OS which is very similar to CentOS
Kernel (e.g. uname -a): Linux 4.18.0
Install tools: kubeadm
Others:

@pohly

The text was updated successfully, but these errors were encountered:

pohly · 2023-01-02T14:56:18Z

No new CSIStorageCapacity object is created.

How do you check for this? With kubectl get csistoragecapacities or kubectl get --all-namespaces csistoragecapacities?

CSIStorageCapacity objects are namespaced, so the second command has to be used.

I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.

pohly · 2023-01-02T14:58:02Z

My commands:

/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity

pohly · 2023-01-02T14:58:35Z

csi-provisioner:v3.3.0

samuelluohaoen1 · 2023-01-06T08:17:02Z

No new CSIStorageCapacity object is created.

How do you check for this? With kubectl get csistoragecapacities or kubectl get --all-namespaces csistoragecapacities?

CSIStorageCapacity objects are namespaced, so the second command has to be used.

I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.

Yes it is indeed namespaced. My kubectl has the default namespace set to the namespace where the CSI plugins are deployed.

samuelluohaoen1 · 2023-01-06T08:19:55Z

My commands:

/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity

From the sequence of your commands I do not see how the controller plugin is deployment before the node plugins. I think the order of the deployment may be crucial to reproducing this issue. Could you make sure that step 2 happens before node plugins are deployed? Thank you for your trouble.

k8s-triage-robot · 2023-04-06T09:01:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-05-06T09:33:59Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-06-05T09:55:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-06-05T09:55:43Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2023-06-14T15:07:21Z

/reopen
/assign

k8s-ci-robot · 2023-06-14T15:07:25Z

@pohly: Reopened this issue.

In response to this:

/reopen
/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubernetes-csi#847 mentions a problem that occurs when the central controller runs before the node plugins. This should be handled through updating node segments and the code exists, it just lacked some unit tests. Those don't trigger the problem mentioned in the issue, but it's worthwhile to add them nonetheless.

pohly · 2023-06-15T09:14:51Z

@samuelluohaoen1: it looks like you are using a central controller for your CSI driver. Is that correct?

Can you perhaps share the external-provisioner log at level >= 5? The is code which should react to changes in the node and CSIDriver objects when the node plugin gets registered after the controller has started.

We don't have a CSI driver deployment readily available to test this scenario. I tried reproducing it through unit tests (see #942) but the code worked as expected.

xing-yang · 2023-06-23T19:14:24Z

/remove-lifecycle rotten

k8s-triage-robot · 2024-01-23T06:41:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

yuxiang-he · 2024-02-22T02:38:27Z

@pohly We observed something similar but the CSIStorageCapacity objects were created after about an hour.

I believe there is currently an issue where the capacity controller is tracking duplicated workqueue entries. See issue #1161

yuxiang-he · 2024-02-22T04:24:11Z

/remove-lifecycle stale

k8s-triage-robot · 2024-05-22T04:38:25Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2023

k8s-ci-robot assigned pohly Jun 14, 2023

k8s-ci-robot reopened this Jun 14, 2023

pohly mentioned this issue Jun 15, 2023

capacity: more test cases #942

Merged

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSIStorageCapacity: Topology segment not updated #847

CSIStorageCapacity: Topology segment not updated #847

samuelluohaoen1 commented Dec 28, 2022 •

edited

pohly commented Jan 2, 2023

pohly commented Jan 2, 2023

pohly commented Jan 2, 2023

samuelluohaoen1 commented Jan 6, 2023

samuelluohaoen1 commented Jan 6, 2023

k8s-triage-robot commented Apr 6, 2023

k8s-triage-robot commented May 6, 2023

k8s-triage-robot commented Jun 5, 2023

k8s-ci-robot commented Jun 5, 2023

pohly commented Jun 14, 2023

k8s-ci-robot commented Jun 14, 2023

pohly commented Jun 15, 2023

xing-yang commented Jun 23, 2023

k8s-triage-robot commented Jan 23, 2024

yuxiang-he commented Feb 22, 2024 •

edited

yuxiang-he commented Feb 22, 2024

k8s-triage-robot commented May 22, 2024

CSIStorageCapacity: Topology segment not updated #847

CSIStorageCapacity: Topology segment not updated #847

Comments

samuelluohaoen1 commented Dec 28, 2022 • edited

pohly commented Jan 2, 2023

pohly commented Jan 2, 2023

pohly commented Jan 2, 2023

samuelluohaoen1 commented Jan 6, 2023

samuelluohaoen1 commented Jan 6, 2023

k8s-triage-robot commented Apr 6, 2023

k8s-triage-robot commented May 6, 2023

k8s-triage-robot commented Jun 5, 2023

k8s-ci-robot commented Jun 5, 2023

pohly commented Jun 14, 2023

k8s-ci-robot commented Jun 14, 2023

pohly commented Jun 15, 2023

xing-yang commented Jun 23, 2023

k8s-triage-robot commented Jan 23, 2024

yuxiang-he commented Feb 22, 2024 • edited

yuxiang-he commented Feb 22, 2024

k8s-triage-robot commented May 22, 2024

samuelluohaoen1 commented Dec 28, 2022 •

edited

yuxiang-he commented Feb 22, 2024 •

edited