Nagios is a very powerful Linux-based Open Source server & network monitoring system, the Core version if free and the more recent “XI” version is their “Enterprise” offering with formal support and additional features. Whichever version you choose, a large part of its functionality comes from the addons and plugins written by 3rd parties for various software & hardware platforms, most of which can be found on the Nagios Exchange website.
For example, if you wanted to monitor performance counters, eventlogs, services and the like on Windows clients, then you need a Windows-based Nagios agent, such as NSClient++, which is the replacement for NSClient & NPRE_NT.
If, on the other hand, you wanted to run Nagios itself on Windows then you’re quite mad, but you can still do it with a package like NagWin, which wrappers up all the bits Nagios needs in Cygwin with Blat for sending email notifications.
Once you’re up and running, you might want some improved graphing of your monitored hosts; the built-in trending only covers the service states (OK, Warning, Critical) rather than the actual values, such as you might get with Disk Space or CPU Utilization monitoring. There are several packages available to do this, but my personal choice is Nagiosgraph; it’s very easy to setup, doesn’t need a heavy-duty database (It uses rrd for storing the graphing information) and is simple to retroactively add to your existing configuration.
And if you’re using Nagiosgraph, why not make the most of it, by including relevant graphs in your email notifications? Amongst other things, these extremely useful scripts from Frank4DD will nicely format your emails, add company logos, links to the relevant hosts and services within Nagios and, where applicable, graphs from Nagiosgraph showing the last 24 hours of activity for the subject of the notification. Highly recommended.
Finally, for the moment, one of the trickier systems I’ve found to monitor have been Netapp filers; they expose everything you could possibly imagine via SNMP, but not in a way that’s easy to interrogate for, say, free/used space on a single volume. Initially, I tracked down a promising looking addon called check_netapp_du but it had a tiny problem; older ONTAP versions only exposed Signed 32-bit SNMP counters for disk space and reported the values in bytes. Some of you may have worked out the problem here, Signed 32-bit values only give you 2,147,483,648 bytes to work with, which is 2Tb, so if your volume is more than 2Tb, you get some decidedly odd results returned. Thankfully, newer versions of ONTAP also have 64-bit SNMP counters, which give you, well, lots of bytes to work with.
The following modified script checks these counters instead of the 32-bit ones and has altered file paths for grep, awk, etc. on Debian, but otherwise behaves identically the original script: check_netapp-du. The relevant 64-bit OIDs are .22.214.171.124.4.1.7126.96.36.199.1.29 for the disk total (dfBT) and .188.8.131.52.4.1.7184.108.40.206.1.30 for the disk used (dfBU) values.
I’m currently working on a number of scripts for monitoring Exchange 2010 servers and DAGs with Nagios, but they’re still very much in Beta, so I’ll save them for another day.