feat: add prometheus rules to export metrics that can be used to observe the impact of StressChaos #4418
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
RFC: chaos-mesh/rfcs#47
This PR implements the feature of exporting experiment metrics in RFC. Experiment metrics are the metrics that describe the effects of the StressChaos experiment. This is the final metrics that end users can observe. The proposed metric name is:
chaos_mesh:stress_chaos:<metric_name>
. For example:chaos_daemon_container_cpu_usage_seconds_total
chaos_mesh:stress_chaos:container_cpu_usage_seconds_total
It is exported with the following labels:
namespace
kind
phase
name
uid
pod
container
The experiment metrics are exported by joining the statistical metrics and relation metrics. The join is done by Prometheus rules in Helm Charts. Thus, the value of the experiment metrics is the same as the statistical metrics, but with additional labels of the experiment.
What's changed and how it works?
Proposal: chaos-mesh/rfcs#47
This PR only modifies the configuration related to Prometheus in Helm Charts. It has added several PromQL rules to export metrics related to StressChaos.
Please note the PR #4415 and #4416 are required for this PR to be functioned.
Related changes
UI interface
Cherry-pick to release branches (optional)
Checklist
CHANGELOG
CHANGELOG.md
Tests
Side effects
DCO
If you find the DCO check fails, please run commands like below (Depends on the actual situations. For example, if the failed commit isn't the most recent) to fix it: