Suppress monit alerts upon dependency

Alert dependecy is useful for network equiptment, as connectivity often is dependent in a chain. Why get alert that your whole network is down, if the error is realy with your router/link/whatever? Making an alert dependency is possible with monit, but it’s a bit tricky.

This is because monit’s keyword then alert only make use of the depends keyword when the dependency is in state unmonitor. This makes it tricky to create an alert dependecy, as then unmonitor turns of the monitoring (you will not get notified when the error is resolved). A solution is to make another watcher which turns monitoring back on.

Here is an example where failed ICMP/ping turns of notification of failed TCP-services:

# used to turn of alerts with depends keyword
check host seljebuUp with address seljebu.no
    if failed icmp type echo count 3 with timeout 3 seconds then unmonitor

# turns on seljebuUp when error resolved, will also alert upon host up/down
check host seljebu with address seljebu.no
    if failed icmp type echo count 3 with timeout 3 seconds then exec "/bin/true"
    else if succeeded then exec "/usr/sbin/monit start seljebuUp"

# dependent on seljebuUp
check host seljebu-service with address seljebu.no
   if failed port 143 proto imap for 4 times within 5 cycles then alert
   if failed port 465 type tcpssl proto smtp for 4 times within 5 cycles then alert
   if failed url http://seljebu.no/ for 4 times within 5 cycles then alert
   depends seljebuUp

A little drawback here is double alerts upon outage and the extra action alerts (upon monit start seljebuUp). You can filter out the extra alerts by having $event in the subject (here, notifications for seljebuUp can be discarded). You could also turn of the action alerts by using set alert your@email.com but not on { action } in the global config (be careful, as this willl turn off all action alerts).

monit – monitor your services

After I got my home server up running, I was looking for a easy way to monitor it services. Earlier I’ve used MRTG, Smokeping, Cacti and similar, but this time I was looking for something really easy and lightweight. The graphs in MRTG and similar is useful to have, but my main focus was to monitor the health of a service and alert if it’s not.

When I found Monit, I understood that monitor software with web front end, graphs and user management(Zabbix, Zenoss core, with more) was an total overkill for my task. Monit is easy to use, repairs the faulty service and have a nice web front end to see the current status.

Install

Since Monit comes with Ubuntu, it’s easy:

apt-get install monit

Edit monit to start. Change startup=0 to startup=1

nano /etc/default/monit

Edit /etc/monit/monitrc. The file is self explainable and there is good documentation on http://mmonit.com/monit/documentation/monit.html. My stripped file looks like this:

set daemon  120           # check services at 2-minute intervals
with start delay 60  # optional: delay the first check by 1 minute
set logfile syslog facility log_daemon
set mailserver localhost,               # primary mailserver
set mail-format { from: monit@mydomain.no } # some mailservers bounce domains not found in DNS
set alert arve@mydomain.no
set httpd port 2812 and
use address localhost  # only accept connection from localhost
allow localhost        # allow localhost to connect to the server and
allow monit:passwd   # allow user monit with password passwd
include /etc/monit/conf.d/* # include conf files

I then created files in /etc/monit/conf.d. Here is my localhost file.

check system localhost
if loadavg(5min) > 4 then alert
if loadavg(15min) > 2 then alert
# filesystems
check filesystem root with path /
if space usage > 80% then alert
check filesystem home with path /home
if space usage > 95% then alert
# cron
check process cron with pidfile /var/run/crond.pid
start program = "/etc/init.d/cron start"
stop  program = "/etc/init.d/cron stop"
if 5 restarts within 5 cycles then timeout
# sshd
check process sshd with pidfile /var/run/sshd.pid
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
#  samba
check process smbd with pidfile /var/run/samba/smbd.pid
start program = "/etc/init.d/smbd start"
stop  program = "/etc/init.d/smbd stop"
if failed host 192.168.1.2 port 139 type TCP  then restart
if 5 restarts within 5 cycles then timeout
#  nfs
check host server.lan with address 192.168.1.2
start = "/etc/init.d/nfs-kernel-server start"
stop = "/etc/init.d/nfs-kernel-server stop"
if failed port 2049 then restart

Services are automagically restarted and you are alerted by email if pid file does not exist, or there is no running service with that pid. If statements holds extra tests for services, and services without pid file can be watched by checking the service over network.

Also services on the network can be watched. Here is another server I’m watching:

check host seljebu.no with address seljebu.no
if failed port 143 proto imap with timeout 2 seconds then alert
if failed port 465 type tcpssl proto smtp with timeout 2 seconds then alert
if failed url http://seljebu.no/ with timeout 2 seconds then alert

Restart monit.

service monit restart

Look for errors.

tail /var/log/syslog

If you get error message /etc/monit/conf.d/host:12: Error: syntax error ‘=’, change all instances of ” with ".

Thats it!