You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have observed a couple of times in the last 3 weeks a weird behaviour where the checks are performed twice, the notifications sent twice (at least the Problem one), but at the same time we also saw that no Recovery notifications were ever sent.
Every time it happened in a small time frame (for e.g. between 8am and 9am), on different number of servers/services with no common pattern between them.
The checker and notifications features are enable in HA on both master. On both of them, from the icinga2.log (is it normal that they log the same? are they doing the same action in parallel?) I see the following lines, where a Problem notification is sent but not the Recovery one:
[2024-02-03 08:20:18 +0100] information/Checkable: Checkable 'hostxxx!servicexxxx' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-02-03 08:20:18 +0100] information/Notification: Sending 'Problem' notification 'hostxxx!servicexxxx!state-notification-to-service' for user 'dummy_user'
[2024-02-03 08:20:18 +0100] information/Notification: Completed sending 'Problem' notification 'hostxxx!servicexxxx!state-notification-to-service' for checkable 'hostxxx!servicexxxx' and user 'dummy_user' using command 'state-notification'.
[2024-02-03 08:20:18 +0100] information/Checkable: Checkable 'hostxxx!servicexxxx' has 1 notification(s). Checking filters for type 'Problem', sends will be logged.
[2024-02-03 08:43:18 +0100] information/Checkable: Checkable 'hostxxx!servicexxxx' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
[2024-02-03 08:43:18 +0100] information/Checkable: Checkable 'hostxxx!servicexxxx' has 1 notification(s). Checking filters for type 'Recovery', sends will be logged.
The first screenshot below is the one linked to the above log. All the messages regarding the notification not sent are weird, as the Problem notification was sent anyway, but not the Recovery.
From the two screenshots we can see how every check/action is done twice or multiple times (soft state, hard state, ok, notifications)
Screenshots
Your Environment
Include as many relevant details about the environment you experienced the problem in
Version used (icinga2 --version): r2.14.1-1
Operating System and version: RHEL 9.2
Enabled features (icinga2 feature list): api-users api checker command graphite ido-mysql mainlog notification
Icinga Web 2 version and modules (System - About): 2.11.4
Config validation (icinga2 daemon -C): OK
If you run multiple Icinga 2 instances, the zones.conf file:
We have migrated our infrastructure from SLES12.5 (Icinga 2.10.3) to RHEL9 (Icinga 2.14.0) around 2 months ago
We have also installed jemalloc-5.2.1-2.el9.x86_64
At the beginning we only had test servers (of which ~1000 with active notifications) to validate the new Icinga2
3 weeks ago we started to monitor the remaining ~2000 Production servers and upgraded Icinga2 to v2.14.1
We have started to see the error in the last 3 weeks, but we don't know if it was introduced by the last minor update to 2.14.1, or if it was already present since the first migration, but as we had fewer servers and less important, it might have been ignored.
The text was updated successfully, but these errors were encountered:
Describe the bug
We have observed a couple of times in the last 3 weeks a weird behaviour where the checks are performed twice, the notifications sent twice (at least the Problem one), but at the same time we also saw that no Recovery notifications were ever sent.
Every time it happened in a small time frame (for e.g. between 8am and 9am), on different number of servers/services with no common pattern between them.
The
checker
andnotifications
features are enable in HA on both master. On both of them, from the icinga2.log (is it normal that they log the same? are they doing the same action in parallel?) I see the following lines, where a Problem notification is sent but not the Recovery one:Screenshots
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): r2.14.1-1icinga2 feature list
): api-users api checker command graphite ido-mysql mainlog notificationicinga2 daemon -C
): OKzones.conf
file:Additional context
SLES12.5
(Icinga 2.10.3) toRHEL9
(Icinga 2.14.0) around 2 months agojemalloc-5.2.1-2.el9.x86_64
We have started to see the error in the last 3 weeks, but we don't know if it was introduced by the last minor update to 2.14.1, or if it was already present since the first migration, but as we had fewer servers and less important, it might have been ignored.
The text was updated successfully, but these errors were encountered: