Skip to content

zimmertr/LibreNMS-Example-Alerts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LibreNMS Example Alerts

LibreNMS is not very plentiful with their example alerts. It took me a significant amount of time to come up with the following alerts so I thought I would create a repository to retain them in the event that I need to rebuild my LibreNMS server in the future. Or, of course, so others could see my examples as well.

Included alerts

This table holds all of my example LibreNMS alerts.

# Name Rule Severity Extra
1 Host has been down for 5 minutes. %macros.device_down = "1" critical Max: 1 Delay: 300 Interval: 1800
2 CPU usage > 90% for 5 minutes. %processors.processor_usage >= "90" warning Max: 1 Delay: 300 Interval: 900
3 Memory usage > 90% for 5 minutes. %mempools.mempool_perc >= "90" warning Max: 1 Delay: 300 Interval: 900
4 Root filesystem has less than 5% free space available. %storage.storage_descr = "/" && %storage.storage_perc >= "95" critical Max: 1 Delay: 60 Interval: 900
5 Disk usage is abnormally high. %storage.storage_perc > %storage.storage_perc_warn && %devices.type = "server" && %storage.storage_descr !~ "/boot" = "" critical Max: 1 Delay: 60 Interval: 900
6 AUTHENTICATION FAILURE!!!!!!! %syslog.msg ~ "@authentication failure@" = %syslog.timestamp >= %macros.past_5m critical Max: -1 Delay: 300 Interval: 300
7 Network usage > 80% for 5 minutes. %macros.port_usage_perc >= "80" && %port.port_descr_type != "client" && %ports.ifType != "softwareLoopback" ok Max: -1 Delay: 300 Interval: 300
8 CPU Latency > 500ms for 1 minute. %device_perf.avg >= "1000" critical Max: 1 Delay: 60 Interval: 900
9 Network usage > 2MBps for 5 minutes. %ports.ifHighSpeed >= "2" ok Max: 1 Delay: 300 Interval: 900
10 Device discovered within the last 60 minutes %eventlog.type = "discovery" && %eventlog.message ~ "@autodiscovered@" && %eventlog.datetime >= %macros.past_60m ok Max: 1 Delay: 0 Interval: 300
11 Host has been rebooted. %devices.uptime < "300" && %macros.device = "1" warning Max: 1 Delay: 300 Interval: 300
12 Poller is taking longer than expected. %pollers.time_taken >= "250" critical Max: -1 Delay: 300 Interval: 300
13 A network port has gone down. %ports.ifOperStatus = "down" && %ports.ifOperStatus_prev = "up" && %macros.device_up = "1" warning Max: -1 Delay: 300 Interval: 300
14 Interface errors rate is abnormally high. %ports.ifOutErrors_rate >= "100" critical Max: -1 Delay: 300 Interval: 300
15 Host has entered a Warning state. %services.service_status = "1" warning Max: -1 Delay: 300 Interval: 300
16 Host has entered a Critical state. %services.service_status = "2" critical Max: -1 Delay: 300 Interval: 300
17 FQDN does not include the domain name. %devices.sysName !~ "@sol.milkyway" warning Max: -1 Delay: 300 Interval: 300
18 Hostname includes the domain name. %devices.hostname ~ "@.sol.milkyway" warning Max: -1 Delay: 300 Interval: 300
19 IP Address is within the DHCP scope. %devices.ip ~ "192.168.1.2@" warning Max: -1 Delay: 300 Interval: 300
20 Host has not been polled within the last day. %devices.last_polled >= "86400" critical Max: -1 Delay: 300 Interval: 300
21 Host is using a slow network adapter. %ports.port_descr_speed > "1000" ok Max: -1 Delay: 300 Interval: 300
22 CPU ready time > 10% for 5 minutes. %processes.cputime >= "10" critical Max: -1 Delay: 300 Interval: 900
23 Host has less than 1GB of allocated memory. %vminfo.vmwVmMemSize <= "1024" warning Max: -1 Delay: 300 Interval: 300
24 Host has more than 3 VCPUs. %vminfo.vmwVmCpus >= "3" warning Max: -1 Delay: 300 Interval: 300

Alert Template

LibreNMS allows you to customize the alert message that is sent to your transport endpoint. The following is my default message. I did not bother with additional templates.

Alert Title: LibreNMS (%hostname) - NEW ALERT
Recovery Title: LibreNMS (%hostname) - CANCELLATION
Alert Body:

{if %state == 0}Duration: %elapsed{else}Severity: %severity{/if}

{if %name}%name{else}%rule{/if}
---------------------------
Timestamp: %timestamp
Uptime: %uptime_long

TODO

At this time several of the above rules are either not working or flawed. I have not done any troubleshooting, but simply disabled the rules in favor of resolving the issue at a later time.

 - Host has more than 3 VCPUs: This rule does not work effectively as it spams for physical devices > 3 CPU cores as well. (My hypervisor, workstation, & laptop)  
 - Host has less than 1GB of allocated memory: This rule is ineffective as well as it mis-diagnoses physical devices. (Perhaps they return null for vminfo which triggers a false positive?)  
 - FQDN does not include the domain name: For some reason this rule is also not functional. (Every device that is monitored returns a failure for this check during each interval cycle.)  
 - Host has been down for 5 minutes: Times are not always accurate. Also delivers false positives for reboots occasionally. For example, rebooted my laptop (~45 seconds) and received an alert saying it had been down for 5 minutes.

Alert Example

The following is a screenshot of my phone displaying several push notifications using the above rules and templates in conjunction with Pusbullet as my transport endpoint.

About

Collection of my custom LibreNMS alerts & templates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published