Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: export container metrics in Chaos Daemon for containerd runtime #4416

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kaaass
Copy link

@kaaass kaaass commented May 14, 2024

What problem does this PR solve?

RFC: chaos-mesh/rfcs#47

This PR implements the feature of exporting statistical metrics in RFC. Statistical metrics are the metrics that describe the statistical information of the container. These metrics are exported by Chaos Daemon. We plan to export the following metrics:

Metric Name Description
chaos_daemon_container_cpu_usage_seconds_total Total CPU usage in seconds of the container.
chaos_daemon_container_memory_working_set_bytes The amount of working set memory in bytes of the container.
chaos_daemon_container_memory_available_bytes The available memory in bytes of the container.
chaos_daemon_container_memory_usage_bytes The memory usage in bytes of the container.
chaos_daemon_container_memory_rss_bytes The amount of RSS memory in bytes of the container.
chaos_daemon_container_memory_page_faults_total Total number of page faults of the container.
chaos_daemon_container_memory_major_page_faults_total Total number of major page faults of the container.
chaos_daemon_container_memory_swap_available_bytes The available swap in bytes of the container.
chaos_daemon_container_memory_swap_usage_bytes The swap usage in bytes of the container.

Statistical metrics are exported with the following labels:

Label Name Description
namespace The namespace of the container.
pod The pod name of the container.
container The container name.

What's changed and how it works?

Proposal: chaos-mesh/rfcs#47

This PR modifies Chaos Daemon.

This PR retrieves statistical information about the container from the CRI interface. To achieve this, the interface ContainerRuntimeInfoClient has been expanded to include a new method StatsByContainerID. This method is used to obtain statistical information about the container from Controller Runtime based on Container ID. Essentially, this method is for decoupling with the CRI API and its functionality is almost identical to that of runtimev1.RuntimeServiceClient's ContainerStats method.

Afterwards, this PR will expose the collected statistical information to the /metric endpoint. To achieve this, this PR has expanded ChaosDaemonMetricsCollector. Since some counter type metrics (such as CPU Usage Seconds) are already increasing when collected, prometheus.MustNewConstMetric is used to export them (a related discussion).

Related changes

  • This change also requires further updates to the website (e.g. docs)
  • This change also requires further updates to the UI interface

Cherry-pick to release branches (optional)

This PR should be cherry-picked to the following release branches:

  • release-2.6
  • release-2.5

Checklist

CHANGELOG

Must include at least one of them.

  • I have updated the CHANGELOG.md
  • I have labeled this PR with "no-need-update-changelog"

Tests

Must include at least one of them.

  • Unit test
  • E2E test
  • Manual test

Side effects

  • Breaking backward compatibility

DCO

If you find the DCO check fails, please run commands like below (Depends on the actual situations. For example, if the failed commit isn't the most recent) to fix it:

git commit --amend --signoff
git push --force

Signed-off-by: KAAAsS <admin@kaaass.net>
@STRRL STRRL self-assigned this May 14, 2024
@STRRL STRRL self-requested a review May 14, 2024 14:09
@cwen0
Copy link
Member

cwen0 commented May 14, 2024

@kaaass Can you provide a test result for this PR?

@STRRL
Copy link
Member

STRRL commented May 14, 2024

Hi @kaaass , please execute make check to format codes.

@kaaass
Copy link
Author

kaaass commented May 14, 2024

@kaaass Can you provide a test result for this PR?

OK! Here is part of the metrics exported by this PR directly fetched from /metrics:

# HELP chaos_daemon_container_cpu_usage_seconds_total Total CPU usage in seconds of the container
# TYPE chaos_daemon_container_cpu_usage_seconds_total counter
chaos_daemon_container_cpu_usage_seconds_total{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 55.920141
# HELP chaos_daemon_container_memory_available_bytes The available memory in bytes of the container
# TYPE chaos_daemon_container_memory_available_bytes gauge
chaos_daemon_container_memory_available_bytes{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_major_page_faults_total Total number of major page faults of the container
# TYPE chaos_daemon_container_memory_major_page_faults_total counter
chaos_daemon_container_memory_major_page_faults_total{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_page_faults_total Total number of page faults of the container
# TYPE chaos_daemon_container_memory_page_faults_total counter
chaos_daemon_container_memory_page_faults_total{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_rss_bytes The amount of RSS memory in bytes of the container
# TYPE chaos_daemon_container_memory_rss_bytes gauge
chaos_daemon_container_memory_rss_bytes{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_swap_available_bytes The available swap memory in bytes of the container
# TYPE chaos_daemon_container_memory_swap_available_bytes gauge
chaos_daemon_container_memory_swap_available_bytes{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_swap_usage_bytes The swap usage in bytes of the container
# TYPE chaos_daemon_container_memory_swap_usage_bytes gauge
chaos_daemon_container_memory_swap_usage_bytes{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_usage_bytes The memory usage in bytes of the container
# TYPE chaos_daemon_container_memory_usage_bytes gauge
chaos_daemon_container_memory_usage_bytes{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 0
# HELP chaos_daemon_container_memory_working_set_bytes The amount of working set memory in bytes of the container
# TYPE chaos_daemon_container_memory_working_set_bytes gauge
chaos_daemon_container_memory_working_set_bytes{container="nginx",namespace="my-namespace",pod="nginx-deployment-576c6b7b6-kjjk6"} 6.53312e+06

Please note that the metrics of other containers are omitted here.

Here is the output of crictl at almost the same time:

$ crictl stats 4cb5ffc8e76f6
CONTAINER           NAME                CPU %               MEM                 DISK                INODES
4cb5ffc8e76f6       nginx               0.13                6.533MB             1.095kB             0

The memory usage is almost the same as the value of chaos_daemon_container_memory_working_set_bytes metric.

Signed-off-by: KAAAsS <admin@kaaass.net>
@kaaass
Copy link
Author

kaaass commented May 14, 2024

Hi @kaaass , please execute make check to format codes.

OK, just pushed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants