New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing notification after recovery outside the time period #10025
Comments
ref/IP/52794 |
Actually a single recovery outside the notification time period is enough not to be notified. I tested the master branch with Icinga DB and no special load btw. |
If a recovery happens outside the notification time period, NotificationRecovery is added to
As I consider --- lib/icinga/checkable-notification.cpp
+++ lib/icinga/checkable-notification.cpp
@@ -271,7 +271,7 @@ bool Checkable::NotificationReasonApplies(NotificationType type)
case NotificationRecovery:
{
auto cr (GetLastCheckResult());
- return cr && IsStateOK(cr->GetState()) && cr->GetState() != GetStateBeforeSuppression();
+ return cr && IsStateOK(cr->GetState());// && cr->GetState() != GetStateBeforeSuppression();
}
case NotificationFlappingStart:
return IsFlapping(); @julianbrost Any opinion on this (as the commit author) before anyone codes anything? |
Have you checked whether the this test still passes with that change? I have the feeling that this might result in extra recovery notifications after downtimes when no problem notification was sent. |
That's just a PoC. My actual suggestion is not to consult GetStateBeforeSuppression() unconditionally in NotificationReasonApplies(), but only if GetSuppressedNotifications() contains NotificationRecovery or NotificationProblem. Because only then GetStateBeforeSuppression() matters IMAO: icinga2/lib/icinga/checkable-check.cpp Lines 512 to 521 in 9e31b8b
|
Okay, now I think I got it. So
So yes, this could work. Maybe moving the check of |
Describe the bug
In our setup we encountered the following problem:
To Reproduce
I could reproduce the bug in a freshly setup docker environment, but only when putting a lot of load on the database.
It still did not occur deterministically every time.
TimePeriod
you need to edit the day accordingly):helloworld
(e.g. by starting a docker container in the same network:docker run --rm --network icinga-playground --name helloworld -it strm/helloworld-http
)mysqlslap --create-schema=idodb --no-drop --user=root -p --query="SELECT * FROM icinga_objects" --concurrency=500 --iterations=20000
After this I did not get a recovery notification, only the following log:
However new notifications were getting through again.
Even after hours without load on the database:
Expected behavior
I expected to get a recovery notification when the time period started again.
Screenshots
In this example the service
hello-service
recovered outside the active time period at 9:56. At 10:00 the time period started again and a recovery notification should have been sent tomy-user
but it was not, even though new notifications (like at 10:14) where getting through.Your Environment
icinga2 --version
):2.13.8
Debian GNU/Linux 11 (bullseye)
icinga2 feature list
):checker debuglog ido-mysql mainlog notification
2.12.0
1.10.2
0.20.0
2.12.0
The text was updated successfully, but these errors were encountered: