Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cinder-csi-plugin] csi-cinder storage capacity #2597

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sergelogvinov
Copy link
Contributor

@sergelogvinov sergelogvinov commented May 16, 2024

Available capacity of disk storage

What this PR does / why we need it:

To ensure Kubernetes scheduler selects the optimal zone/region based on available disk capacity.
Also to have statistics/alerts in cluster for the specific storageClass

Which issue this PR fixes(if applicable):
for #2035, #2551

Special notes for reviewers:

kubectl get csistoragecapacities -ocustom-columns=CLASS:.storageClassName,AVAIL:.capacity,ZONE:.nodeTopology.matchLabels -A
CLASS                    AVAIL     ZONE
csi-cinder-classic-xfs   19990Gi   map[topology.cinder.csi.openstack.org/zone:nova]

Release note:

required documentation update/helm-chart/migration process

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 16, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @sergelogvinov. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jichenjc for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 16, 2024
@jichenjc
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 16, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 16, 2024
@sergelogvinov
Copy link
Contributor Author

/retest

@@ -396,6 +397,16 @@ func (os *OpenStack) GetMaxVolLimit() int64 {
return defaultMaxVolAttachLimit
}

// GetFreeCapacity returns free capacity of the block storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to say that result is expected in bytes. Which is also a bit weird to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, changed to Gb

@@ -396,6 +397,16 @@ func (os *OpenStack) GetMaxVolLimit() int64 {
return defaultMaxVolAttachLimit
}

// GetFreeCapacity returns free capacity of the block storage
func (os *OpenStack) GetFreeCapacity() (int64, error) {
res, err := volumelimit.Get(os.blockstorage).Extract()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but this sounds like an admin API. We can't make admin calls from CSIs, all CPO is supposed to work normally on a regular tenant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.openstack.org/api-ref/block-storage/v3/#limits-limits i did not find the role/permission list for csi-cinder.
I think this feature should be optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm it with DevStack and demo tenant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVH Cloud, openstack account has computeOperator, volumeOperator permission, here doc: https://help.ovhcloud.com/csm/en-public-cloud-authenticate-api-openstack-service-account?id=kb_article_view&sysparm_article=KB0059364

but i do not know/can get the real openstack permission of it.
If you have a permission list of DevStack - i'd like to check...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked that the API is available. What puzzles me now is - how often is GetCapacity() called? Would calling this API be a significant blow on the Cinder API in the cloud?

/ok-to-test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it calls 1/min (capacity-poll-interval flag of csi-provisioner https://github.com/kubernetes-csi/external-provisioner/blob/c7f94435bd29e49edf1af0eda9c4ee2907c4d160/cmd/csi-provisioner/csi-provisioner.go#L111) for each storageClass and accessibleTopology (av zone).

flag enable-capacity by default is off, so csi-provisioner does not collect this metrics by default.

@sergelogvinov
Copy link
Contributor Author

/retest-required

@sergelogvinov
Copy link
Contributor Author

/retest-required

1 similar comment
@sergelogvinov
Copy link
Contributor Author

/retest-required

@@ -396,6 +397,21 @@ func (os *OpenStack) GetMaxVolLimit() int64 {
return defaultMaxVolAttachLimit
}

// GetFreeCapacity returns free capacity of the block storage, in GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least this is not my understanding
the free storage usually comes from backend and I don't know whether every storage provider has this

and the API seems from here https://github.com/openstack/cinder/blob/master/cinder/api/v3/limits.py#L25 ?
if so ,it's actually the quota, not the real storage ? e.g you can have 100G max NFS pool
but you can set quota to 1T ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a quota.
Based on this quota, I cannot allocate more disk space as I am not the owner of the OpenStack cluster.

These metrics provide the scheduler with additional information about which availability zone the pod can be scheduled in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um... at least the CSI GetFreeSpace is expecting to give real free space?
this might confuse a little bit

and if the quota is bigger than real storage, this is not accurate info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I see that you mean. thanks.
I've tried to get the real size of the pool, but it required rule:admin_api role for the api method scheduler_extension:scheduler_stats:get_pools

So, It is better to rename the function to GetFreeQuotaStorageSpace since we cannot get the real size of the pool.

@sergelogvinov sergelogvinov force-pushed the csi-cinder-capacity branch 2 times, most recently from 71ee34a to a5a1e4d Compare May 28, 2024 16:16
Available capacity of disk storage

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
@sergelogvinov
Copy link
Contributor Author

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants