Add metrics documentation #339

saad-ali · 2020-06-19T00:31:31Z

I need to add documentation to https://kubernetes-csi.github.io/docs/sidecar-containers.html

Background:

A new CSI Metrics Library was added to csi-lib-utils in and is part of v0.7.0 release. This library can be used to automatically generate Prometheus metrics for all CSI operations including total count, error count, and call latency. This library was integrated in to the following CSI Sidecar containers:

Add prometheus metrics to CSI external-provisioner using new csi-lib-utils library external-provisioner#388
Add prometheus metrics to CSI external-attacher using new csi-lib-utils library external-attacher#201
Add prometheus metrics to CSI external-snapshotter using new csi-lib-utils library external-snapshotter#227
Add prometheus metrics to CSI external-resizer using new csi-lib-utils library external-resizer#67

New flags “--metrics-address” or “--metrics-path” are now part of all 4 of those sidecars. Driver deployments should set those flags to ensure the metrics are being emitted.

The text was updated successfully, but these errors were encountered:

pohly · 2020-06-19T06:22:10Z

It would be good have a short example how those metrics can be used. Not sure whether that belongs into that documentation (which is probably more reference-oriented) or into a blog post.

pohly · 2020-06-19T06:55:57Z

For a full example, integration with Prometheus and a Grafana dashboard would be useful. While investigating this, I found: https://github.com/helm/charts/tree/master/stable/prometheus#scraping-pod-metrics-via-annotations

But that only works for a single metrics endpoint per pod. When running external-provisioner, external-attacher, external-snapshotter and external-resizer all in the same statefulset and thus pod it won't be that easy, right?

pohly · 2020-06-19T07:02:53Z

See prometheus/prometheus#3756

pohly · 2020-06-19T11:25:10Z

CSI calls issued by kubelet are not exported yet?

pohly · 2020-06-19T14:01:53Z

Would it make sense for CSI drivers to export the same function count metric?

The code in https://github.com/saad-ali/csi-lib-utils/blob/e9a22428988a90ba8d833b5e235fcd22d16cd5fa/metrics/metrics.go currently doesn't support that:

only has an interceptor for the gRPC client, but not the server
hard-codes "csi_sidecar" as subsystem

The subsystem string then appears in metrics names like csi_sidecar_operations_seconds_count.

I could imagine that correlating those different counts may be useful, for example to detect when calls have problems at the transport level and don't reach the CSI driver.

pohly · 2020-06-19T18:52:42Z

After having read through the config documentation I believe I understand enough of it to replace or extend the example configuration such that it scrapes each sidecar container individually.

But then the problem remains that admins will have to add that to their Prometheus configuration. I don't see an easy way to do that when deploying through helm. If I understand it right, one can replace the entire default config, but not add to it.

pohly · 2020-06-22T10:02:47Z

If I understand it right, one can replace the entire default config, but not add to it.

That turned out to be wrong. There is some limited support for extending the default configuration.

I found a solution with an additional, generic scrape config and filed helm/charts#22899 to figure out whether that is something that should be supported by the Helm chart out-of-the-box.

fejta-bot · 2020-09-20T11:01:14Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

pohly · 2020-09-21T09:47:34Z

/remove-lifecycle stale

fejta-bot · 2020-12-20T10:31:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

pohly · 2020-12-20T16:25:42Z

/remove-lifecycle stale
/lifecycle frozen

msau42 · 2022-08-05T22:46:09Z

/help

k8s-ci-robot · 2022-08-05T22:46:10Z

@msau42:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly mentioned this issue Jun 19, 2020

extend, document and test metrics support intel/pmem-csi#666

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 20, 2020

ramineni mentioned this issue Mar 8, 2021

[cinder-csi-plugin] Add http endpoint of CSI container kubernetes/cloud-provider-openstack#1398

Closed

pohly mentioned this issue Mar 10, 2021

additional metrics kubernetes-csi/external-provisioner#579

Merged

pohly mentioned this issue Apr 10, 2021

Add changelog for v2.2.0 kubernetes-csi/external-provisioner#605

Merged

ejweber mentioned this issue Jun 18, 2021

metrics endpoint will not be started because metrics-address was not specified ThinkParQ/beegfs-csi-driver#2

Closed

MPV mentioned this issue Jan 19, 2022

[cinder-csi-plugin] metrics are not accessible kubernetes/cloud-provider-openstack#913

Closed

1 task

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics documentation #339

Add metrics documentation #339

saad-ali commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 22, 2020

fejta-bot commented Sep 20, 2020

pohly commented Sep 21, 2020

fejta-bot commented Dec 20, 2020

pohly commented Dec 20, 2020

msau42 commented Aug 5, 2022

k8s-ci-robot commented Aug 5, 2022

Add metrics documentation #339

Add metrics documentation #339

Comments

saad-ali commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 19, 2020

pohly commented Jun 22, 2020

fejta-bot commented Sep 20, 2020

pohly commented Sep 21, 2020

fejta-bot commented Dec 20, 2020

pohly commented Dec 20, 2020

msau42 commented Aug 5, 2022

k8s-ci-robot commented Aug 5, 2022

Guidelines