Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add prometheus rules to export metrics that can be used to observe the impact of StressChaos #4418

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kaaass
Copy link

@kaaass kaaass commented May 14, 2024

What problem does this PR solve?

RFC: chaos-mesh/rfcs#47

This PR implements the feature of exporting experiment metrics in RFC. Experiment metrics are the metrics that describe the effects of the StressChaos experiment. This is the final metrics that end users can observe. The proposed metric name is: chaos_mesh:stress_chaos:<metric_name>. For example:

  • Statistical metrics: chaos_daemon_container_cpu_usage_seconds_total
  • Experiment metrics: chaos_mesh:stress_chaos:container_cpu_usage_seconds_total

It is exported with the following labels:

Label Name Description
namespace The namespace of the container.
kind The kind of the experiment.
phase The phase of the experiment.
name The name of the experiment.
uid The UID of the experiment.
pod The pod name of the selected container.
container The container name of the selected container.

The experiment metrics are exported by joining the statistical metrics and relation metrics. The join is done by Prometheus rules in Helm Charts. Thus, the value of the experiment metrics is the same as the statistical metrics, but with additional labels of the experiment.

What's changed and how it works?

Proposal: chaos-mesh/rfcs#47

This PR only modifies the configuration related to Prometheus in Helm Charts. It has added several PromQL rules to export metrics related to StressChaos.

Please note the PR #4415 and #4416 are required for this PR to be functioned.

Related changes

  • This change also requires further updates to the website (e.g. docs)
  • This change also requires further updates to the UI interface

Cherry-pick to release branches (optional)

This PR should be cherry-picked to the following release branches:

  • release-2.6
  • release-2.5

Checklist

CHANGELOG

Must include at least one of them.

  • I have updated the CHANGELOG.md
  • I have labeled this PR with "no-need-update-changelog"

Tests

Must include at least one of them.

  • Unit test
  • E2E test
  • Manual test

Side effects

  • Breaking backward compatibility

DCO

If you find the DCO check fails, please run commands like below (Depends on the actual situations. For example, if the failed commit isn't the most recent) to fix it:

git commit --amend --signoff
git push --force

@kaaass
Copy link
Author

kaaass commented May 14, 2024

Here is a snapshot of the feature implemented in this PR. It can be seen that the two StressChaos experiments have had an impact on the CPU Usage of the Pod. At the same time, these impacts are reflected by metrics.

截屏2024-05-14 21 49 26

Signed-off-by: KAAAsS <admin@kaaass.net>
@STRRL STRRL self-requested a review May 14, 2024 14:18
@STRRL STRRL self-assigned this May 14, 2024
@kaaass kaaass changed the title feat: add prometheus rules to export StressChaos metrics feat: add prometheus rules to export metrics that can be used to observe the impact of StressChaos May 14, 2024
Signed-off-by: KAAAsS <admin@kaaass.net>
Signed-off-by: KAAAsS <admin@kaaass.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants