Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DORA metrics #1317

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add DORA metrics #1317

wants to merge 5 commits into from

Conversation

fsequeira1
Copy link

The idea behind this is to add support for some DORA metrics:

Deployment Frequency — How often an organization successfully releases to production

flagger_count_canary_success - flagger_count_canary_success offset 24h / 24  # deployments in a day

Change Failure Rate — The percentage of deployments causing a failure in production

100 * ( flagger_count_canary_failure / flagger_count_canary_success + flagger_count_canary_failure )

I'm wondering if there is any way to grab the amount of time it takes a commit to get into production and how long it takes to recover from a failure in production.
However, I'm unsure if I can do it with the current metrics.

fsequeira1 and others added 5 commits November 17, 2022 11:55
Signed-off-by: Filipe Sequeira <filipe@weave.works>
Signed-off-by: Filipe Sequeira <filipe@weave.works>
Signed-off-by: Filipe Sequeira <filipe@weave.works>
Signed-off-by: Filipe Sequeira <filipe@weave.works>
@stefanprodan
Copy link
Member

I'm wondering if there is any way to grab the amount of time it takes a commit to get into production

The histogram could be used to extract this, also the two metrics that you've added, I think they can be composed from the existing histogram count metric.

@fsequeira1
Copy link
Author

I assumed the same, however, I couldn't get the results I wanted, maybe I'm doing something wrong. I want to know the number of times there is a failure, which is different from the number of failures that you can see in the following picture:
image

Can you elaborate/point me to how I could use the histogram to extract that info? I didn't follow

@stefanprodan
Copy link
Member

The histogram I'm referring to is called canary_duration_seconds, please try that.

@fsequeira1
Copy link
Author

fsequeira1 commented Nov 18, 2022

I think this will solve the issue with one of the metrics (Lead Time for Changes). However the other seems to be dependent on external factors (the deployment can be healthy and the application failing) and probably the best way is to collect it from some kind of external checker like blackbox-exporter.

Lead Time for Changes (from a deployment point of view) — The amount of time it takes a commit to get into production

rate(gotk_reconcile_duration_seconds_sum[5m])/ rate(gotk_reconcile_duration_seconds_count[5m]) > 0
+
rate(flagger_canary_duration_seconds_sum[5m]) / rate(flagger_canary_duration_seconds_count[5m]) > 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants