Expose containing port of serving metrics #4877

wangxf1987 · 2024-04-28T03:25:55Z

…erviceMonitor

What type of PR is this?
This is improve for installation, the port of karmada-apiserver is exposed, but other componment is not.

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Expose the default port  for the karmada-controller-manager, scheduler and agent when creating a PodMonitor.

karmada-bot · 2024-04-28T03:26:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign poor12 after the PR has been reviewed.
You can assign the PR to them by writing /assign @poor12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

charts/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov-commenter · 2024-04-28T03:36:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.07%. Comparing base (3232c52) to head (a4d9be4).
Report is 13 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4877      +/-   ##
==========================================
+ Coverage   53.05%   53.07%   +0.01%     
==========================================
  Files         250      251       +1     
  Lines       20396    20389       -7     
==========================================
- Hits        10822    10821       -1     
+ Misses       8855     8854       -1     
+ Partials      719      714       -5

Flag	Coverage Δ
unittests	`53.07% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RainbowMango · 2024-04-28T07:22:50Z

/assign @chaosi-zju

chaosi-zju · 2024-04-28T07:44:13Z

Hi! @wangxf1987 Glad to see your contribution!

The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <authoremail@example.com> \n <other message> ' to pass DCO.

Detail guideline can refer to: https://probot.github.io/apps/dco/

charts/karmada/templates/karmada-controller-manager.yaml

…erviceMonitor Signed-off-by: wangxiaofei67 <wangxiaofei67@jd.com>

wangxf1987 · 2024-04-29T11:05:33Z

Hi! @wangxf1987 Glad to see your contribution!

The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <authoremail@example.com> \n <other message> ' to pass DCO.

Detail guideline can refer to: https://probot.github.io/apps/dco/

done

chaosi-zju · 2024-04-30T01:45:58Z

/lgtm

chaunceyjiang · 2024-05-08T07:08:36Z

Can you write the release-notes? @wangxf1987

wangxf1987 · 2024-05-09T02:57:24Z

Can you write the release-notes? @wangxf1987
@chaunceyjiang
Ok, i should modify the release notes for that version, is it https://github.com/karmada-io/karmada/blob/master/docs/CHANGELOG/CHANGELOG-0.10.md ?

chaunceyjiang · 2024-05-13T02:33:48Z

@wangxf1987 Here. In your PR description

wangxf1987 · 2024-05-13T06:17:47Z

Expose the default port for the karmada-controller-manager, scheduler and agent when creating a PodMonitor.

done

RainbowMango · 2024-05-14T06:32:39Z

Is this ServiceMonitor and PodMonitor?

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent?
cc @chaosi-zju

chaosi-zju · 2024-05-14T09:18:36Z

Let me talk about my understanding of this PR:

First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in

karmada/cmd/controller-manager/app/controllermanager.go

Line 161 in e6c92d5

    
           Metrics:                    metricsserver.Options{BindAddress: opts.MetricsBindAddress},

we can try as:

$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:

$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR.

However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors

take PodMonitor as an example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web

In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus.

Pay attention to spec.podMetricsEndpoints:

https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184

the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service.

So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor.

this is the reason for why this PR proposed.

Hi @wangxf1987, am I right?

chaosi-zju · 2024-05-14T09:23:38Z

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent?

If we all agree on the necessity of this PR, then we should build an issue to make up for the corresponding modifications of other installation methods.

wangxf1987 · 2024-05-14T13:11:33Z

Is this ServiceMonitor and PodMonitor?

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent? cc @chaosi-zju

Expose this port for PodMonitor only

wangxf1987 · 2024-05-14T13:15:39Z

Let me talk about my understanding of this PR:

First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in

karmada/cmd/controller-manager/app/controllermanager.go

Line 161 in e6c92d5

Metrics: metricsserver.Options{BindAddress: opts.MetricsBindAddress},

we can try as:
$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4
even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:
$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4
So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR.

However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors

take PodMonitor as an example:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web
In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus.

Pay attention to spec.podMetricsEndpoints:

https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184

the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service.

So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor.

this is the reason for why this PR proposed.

Hi @wangxf1987, am I right?

Yes, the por name needs to be specified in PM, so the metrices port name needs to be defined in the contaienr.

wangxf1987 · 2024-05-14T13:22:06Z

I think we also need a new PR describing how to monitor the search component. Similarly, you need to modify the port name in the search service.

---
apiVersion: v1
kind: Service
metadata:
  name: {{ $name }}-search
  namespace: {{ include "karmada.namespace" . }}
  labels:
    {{- include "karmada.search.labels" . | nindent 4 }}
spec:
  ports:
    - name: https # add this
      port: 443
      protocol: TCP
      targetPort: 443
  selector:
    {{- include "karmada.search.labels" . | nindent 4 }}

wangxf1987 · 2024-05-14T13:25:06Z

Let me talk about my understanding of this PR:
First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in

karmada/cmd/controller-manager/app/controllermanager.go

Line 161 in e6c92d5

Metrics: metricsserver.Options{BindAddress: opts.MetricsBindAddress},

we can try as:
$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4
even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:
$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4
So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR.
However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors
take PodMonitor as an example:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web
In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus.
Pay attention to spec.podMetricsEndpoints:
https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184
the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service.
So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor.
this is the reason for why this PR proposed.
Hi @wangxf1987, am I right?
Yes, the por name needs to be specified in PM, so the metrices port name needs to be defined in the contaienr.

when the port is defined by default, it it used by prothumes. like this.

wangxf1987 · 2024-05-14T13:25:56Z

I think we also need a new PR describing how to monitor the search component. Similarly, you need to modify the port name in the search service.

---
apiVersion: v1
kind: Service
metadata:
  name: {{ $name }}-search
  namespace: {{ include "karmada.namespace" . }}
  labels:
    {{- include "karmada.search.labels" . | nindent 4 }}
spec:
  ports:
    - name: https # add this
      port: 443
      protocol: TCP
      targetPort: 443
  selector:
    {{- include "karmada.search.labels" . | nindent 4 }}

chaosi-zju · 2024-05-14T14:49:57Z

It looks really fantastic! 👍

Did you do this monitoring through the existing karmada document or create it through your personal way?

By the way, I actually very much hope you can share the complete operation steps of your build monitoring in the form of documents ٩(๑❛ᴗ❛๑)۶. If you also willing to contribute a document to karmada, you can refer to this doc.

RainbowMango · 2024-05-16T02:00:22Z

@chaosi-zju Would you like to run a test as per this document, and share it with us at a community meeting?

chaosi-zju · 2024-05-16T02:11:39Z

Would you like to run a test as per this document, and share it with us at a community meeting?

OK

RainbowMango

/assign

RainbowMango

This PR focuses on exposing the metrics endpoint for karmada-agent, karmada-controller-manager and karmada-scheduler.
What about the other components? I suppose all components' configuration should be synchronized.

RainbowMango · 2024-05-25T03:26:24Z

charts/karmada/templates/karmada-scheduler.yaml

@@ -60,6 +60,10 @@ spec:
            initialDelaySeconds: 15
            periodSeconds: 15
            timeoutSeconds: 5
+          ports:
+            - containerPort: 10351
+              name: http


Suggested change

name: http

name: metrics

Why not name it with metrics?

RainbowMango · 2024-05-25T03:34:42Z

/retitle Expose containing port of serving metrics

karmada-bot requested review from calvin0327 and chaosi-zju April 28, 2024 03:25

karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 28, 2024

karmada-bot assigned chaosi-zju Apr 28, 2024

chaosi-zju reviewed Apr 28, 2024

View reviewed changes

charts/karmada/templates/karmada-controller-manager.yaml Show resolved Hide resolved

Expose default port for monitoring, in case of create PodMonitor or S…

a4d9be4

…erviceMonitor Signed-off-by: wangxiaofei67 <wangxiaofei67@jd.com>

wangxf1987 force-pushed the ft/expose-port branch from 2b9e3cc to a4d9be4 Compare April 29, 2024 02:47

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2024

RainbowMango reviewed May 24, 2024

View reviewed changes

karmada-bot assigned RainbowMango May 24, 2024

RainbowMango reviewed May 25, 2024

View reviewed changes

karmada-bot changed the title ~~Expose default port for monitoring, in case of create PodMonitor or S…~~ Expose containing port of serving metrics May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose containing port of serving metrics #4877

Expose containing port of serving metrics #4877

wangxf1987 commented Apr 28, 2024 •

edited

karmada-bot commented Apr 28, 2024

codecov-commenter commented Apr 28, 2024 •

edited

RainbowMango commented Apr 28, 2024

chaosi-zju commented Apr 28, 2024

wangxf1987 commented Apr 29, 2024

chaosi-zju commented Apr 30, 2024

chaunceyjiang commented May 8, 2024

wangxf1987 commented May 9, 2024

chaunceyjiang commented May 13, 2024

wangxf1987 commented May 13, 2024

RainbowMango commented May 14, 2024

chaosi-zju commented May 14, 2024 •

edited

chaosi-zju commented May 14, 2024

wangxf1987 commented May 14, 2024

wangxf1987 commented May 14, 2024

wangxf1987 commented May 14, 2024 •

edited

wangxf1987 commented May 14, 2024

wangxf1987 commented May 14, 2024

chaosi-zju commented May 14, 2024

RainbowMango commented May 16, 2024

chaosi-zju commented May 16, 2024

RainbowMango left a comment

RainbowMango left a comment

RainbowMango May 25, 2024

RainbowMango commented May 25, 2024

Expose containing port of serving metrics #4877

Are you sure you want to change the base?

Expose containing port of serving metrics #4877

Conversation

wangxf1987 commented Apr 28, 2024 • edited

karmada-bot commented Apr 28, 2024

codecov-commenter commented Apr 28, 2024 • edited

Codecov Report

RainbowMango commented Apr 28, 2024

chaosi-zju commented Apr 28, 2024

wangxf1987 commented Apr 29, 2024

chaosi-zju commented Apr 30, 2024

chaunceyjiang commented May 8, 2024

wangxf1987 commented May 9, 2024

chaunceyjiang commented May 13, 2024

wangxf1987 commented May 13, 2024

RainbowMango commented May 14, 2024

chaosi-zju commented May 14, 2024 • edited

chaosi-zju commented May 14, 2024

wangxf1987 commented May 14, 2024

wangxf1987 commented May 14, 2024

wangxf1987 commented May 14, 2024 • edited

wangxf1987 commented May 14, 2024

wangxf1987 commented May 14, 2024

chaosi-zju commented May 14, 2024

RainbowMango commented May 16, 2024

chaosi-zju commented May 16, 2024

RainbowMango left a comment

Choose a reason for hiding this comment

RainbowMango left a comment

Choose a reason for hiding this comment

RainbowMango May 25, 2024

Choose a reason for hiding this comment

RainbowMango commented May 25, 2024

wangxf1987 commented Apr 28, 2024 •

edited

codecov-commenter commented Apr 28, 2024 •

edited

chaosi-zju commented May 14, 2024 •

edited

wangxf1987 commented May 14, 2024 •

edited