Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbiosis monit failure emails in Stretch #129

Closed
andrewladlow opened this issue Jul 30, 2018 · 1 comment
Closed

Symbiosis monit failure emails in Stretch #129

andrewladlow opened this issue Jul 30, 2018 · 1 comment

Comments

@andrewladlow
Copy link
Contributor

The symbiosis-monit script will return an exit code of 75 for a few reasons: if it's been disabled, if the machine is still booting, if the load is higher than the number of CPU cores, or if dpkg is running:

root@jessie:~# grep -c processor /proc/cpuinfo     
1
root@jessie:~# cat /proc/loadavg
4.00 4.00 3.87 5/130 5696
root@jessie:~# /usr/sbin/symbiosis-monit -t email /etc/symbiosis/monit.d -a
root@jessie:~# echo $?
75

In Symbiosis Stretch, this will be printed to syslog:

upgrade2 systemd[1]: symbiosis-monit.service: Main process exited, code=exited, status=75/n/a
upgrade2 systemd[1]: symbiosis-monit.service: Unit entered failed state.
upgrade2 systemd[1]: symbiosis-monit.service: Failed with result 'exit-code'.

And also as an email:

Subject: Symbiosis monitor detected service failure
root : TTY=unknown ; PWD=/ ; USER=nobody ; COMMAND=/usr/bin/tee /var/tmp/symbiosis-monit.cursor
pam_unix(sudo:session): session opened for user nobody by (uid=0)
Started Symbiosis monitor.
symbiosis-monit.service: Main process exited, code=exited, status=75/n/a
symbiosis-monit.service: Unit entered failed state.
symbiosis-monit.service: Triggering OnFailure= dependencies.
symbiosis-monit.service: Failed with result 'exit-code'.

Server load will frequently rise above the number of CPU cores on busy servers, generating a large amount of emails. Printing to syslog is useful if there are problems with the symbiosis-monit service itself, but we should probably only send a failure email when an individual test has failed (e.g. apache2), rather than the entire service.

@andrewladlow
Copy link
Contributor Author

Fixed in f911398

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant