Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: raise alert on long running queries per user instead of single pid #29

Merged

Conversation

dcupif
Copy link
Contributor

@dcupif dcupif commented May 15, 2024

This commit modifies the existing alert on long running queries to be raised at the user level instead of the finer granularity of the pid level.

The reason is that if a query holds a lock which blocks several queries, we will raise an alert for each query on hold, therefore spamming our alert channels and creating noise.

Reducing the granularity to the user will improve readability and also understand quicker which apps and services are impacted.

This commit modifies the existing alert on long running queries to be
raised at the user level instead of the finer granularity of the pid
level.

The reason is that if a query holds a lock which blocks several queries,
we will raise an alert for each query on hold, therefore spamming our
alert channels and creating noise.

Reducing the granularity to the user will improve readability and also
understand quicker which apps and services are impacted.
@dcupif dcupif requested a review from vmercierfr May 15, 2024 12:44
@dcupif dcupif self-assigned this May 15, 2024
@dcupif dcupif requested a review from qfritz May 15, 2024 12:44
@dcupif
Copy link
Contributor Author

dcupif commented May 15, 2024

#sre-storage

@dcupif
Copy link
Contributor Author

dcupif commented May 15, 2024

@vmercierfr this would be a breaking change since I modified the alert name. How would you like to handle the release of such change? I would create a new release, but we should adhere to semantic versioning standards I guess

summary: "Long running query on {{ $labels.datname }} of {{ $labels.target }}"
description: "{{ $labels.usename }} is running a long query on {{ $labels.datname }} of {{ $labels.target }} with pid {{ $labels.pid }}"
summary: "Long running queries on {{ $labels.datname }} of {{ $labels.target }} initiated by {{ $labels.usename }}"
description: "{{ $labels.usename }} is running long queries on {{ $labels.datname }} of {{ $labels.target }}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be useful to keep the pid in the description as well?
This way you'd have everything you need right away to run a pg_cancel_backend or pg_terminate_backend.

WDYT?

Copy link
Contributor Author

@dcupif dcupif May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this setup there are potentially several pids, to be honest I'm unsure if I can access them, and how I could properly display it. But yes, if possible it'd be nice to have them all in the description I guess!

@vmercierfr
Copy link
Contributor

vmercierfr commented May 17, 2024

@vmercierfr this would be a breaking change since I modified the alert name. How would you like to handle the release of such change? I would create a new release, but we should adhere to semantic versioning standards I guess

LGTM, we would release a v0.2.0-beta.1 to be consistent

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.
https://semver.org/

Meanwhile, I documented how to release versions: #30

@dcupif dcupif merged commit 0aff185 into main May 21, 2024
5 checks passed
@dcupif dcupif deleted the group-alerts-for-long-running-queries-to-avoid-alert-spam branch May 21, 2024 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants