Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric specific to workflow retries per namespace #5768

Open
tsurdilo opened this issue Apr 19, 2024 · 3 comments
Open

Metric specific to workflow retries per namespace #5768

tsurdilo opened this issue Apr 19, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@tsurdilo
Copy link
Contributor

Currently we seem to only have workflow_retry_backoff_timer
available to check for count of workflow retries.

Feature request is to add workflow_retried counter metric that would be able to be filtered by namespace.

Thanks.

@tsurdilo tsurdilo added the enhancement New feature or request label Apr 19, 2024
@yiminc
Copy link
Member

yiminc commented Apr 26, 2024

@tsurdilo how are user going to use this metrics? Do we need to tag the workflowID on the metrics? That will have cardinality issue.

@yiminc
Copy link
Member

yiminc commented Apr 26, 2024

The attempt count should also be available from within workflow. I think workflow.GetInfo() should return it already.

@clayzermk1
Copy link

Hi folks, @tsurdilo was kind enough to make this issue from a conversation we were having on Slack. Basically, we are in a situation where we manage the control plane, but not the workflows themselves. We would like to know the global retry count, ideally by namespace, at a given time.

yiminc added a commit that referenced this issue May 14, 2024
## What changed?
Add workflow_backoff_timer metrics with namespace tag.

## Why?
For #5768
Current task_requests{operation="TimerActiveTaskWorkflowBackoffTimer"}
does not differentiate if the backoff is due to
retry/cron/delayed_start. Current workflow_retry_backoff_timer does not
have namespace tag.

## How did you test it?
Verify the metrics as:

workflow_backoff_timer{backoff_type="Retry",namespace="default",service_name="history"}
1

## Potential risks
No
ychebotarev pushed a commit to ychebotarev/temporal that referenced this issue May 16, 2024
## What changed?
Add workflow_backoff_timer metrics with namespace tag.

## Why?
For temporalio#5768
Current task_requests{operation="TimerActiveTaskWorkflowBackoffTimer"}
does not differentiate if the backoff is due to
retry/cron/delayed_start. Current workflow_retry_backoff_timer does not
have namespace tag.

## How did you test it?
Verify the metrics as:

workflow_backoff_timer{backoff_type="Retry",namespace="default",service_name="history"}
1

## Potential risks
No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants