This script will leverage the Veeam Powershell modules in conjunction with Nagios’ check_nrpe plugin (or similar passive checking plugin) to check the status of Veeam backup jobs. You will probably need to have run the Install-VeeamToolkit.ps1 script in “C:\Program Files\Veeam\Backup and Replication” on any server you’re running these checks on otherwise it will be unable to load the Veeam snapin, unless you have Powershell 3 installed and have your fingers crossed.

You will also need to ensure that your agent allows NRPE arguments and don’t strip special characters (it needs to keep the “” quotes around the job name).

If your Veeam SQL database is off-box then your agent service will need to be running under an account that has access to the DB on the remote SQL server otherwise the lookups will fail.

Check command syntax for NSClient++ is something like the below, other agents may differ:
check_veeam = cmd /c echo scripts\check_veeam.ps1 "$ARG1$"; exit($lastexitcode) | powershell.exe -noninteractive -noprofile -command -

Check command syntax for the Nagios commands.cfg is something like the below:
$USER1$/check_nrpe -H $HOSTADDRESS$ -c check_veeam -a $ARG1$

<#
Copyright (c) 2013, Adam Beardwood
All rights reserved.
 
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met: 
 
1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer. 
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution. 
 
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#>
 
#Check Veeam Backups
#Adam Beardwood 10/07/2013
#Based on a script by by Tytus Kurek
#v1.0 - Initial Release

$error.clear()

$snaps = Get-PSSnapin
foreach($snap in $snaps){if($snap.name -eq "VeeamPSSnapin"){$exflag = 1}}
if($exflag -ne 1){
	Add-PSSnapin -name VeeamPSSnapin -erroraction silentlycontinue
	if($error[0] -ne $null){write-host "CRITICAL - Could not load Veeam snapin";exit 2}
}

$name = $args[0]

if($args[0] -eq $null){write-host "CRITICAL - You must provide a job name.";exit 2}

$job = Get-VBRJob -Name $name
$name = $job.name

if ($job -eq $null){
	Write-Host "UNKNOWN - Could Not Find Job: $name."
	exit 3
}

$status = $job.GetLastResult()
$time = $($job.findlastsession()).EndTime
if($($job.findlastsession()).State -eq "Working"){
	Write-Host "OK - Job: $name is currently in progress."
	exit 0
}elseif($status -eq "Failed"){
	Write-Host "CRITICAL - Job: $name failed at $time."
	exit 2
}elseif ($status -ne "Success"){
	Write-Host "WARNING - Job: $name completed with warnings at $time."
	exit 1
}else{
	Write-Host "OK - Job: $name completed successfully at $time."
	exit 0
}
#End

Hope you find it helpful.



Nagios is great. It monitors all your stuff, uses email/sms/text-to-speech/carrier pigeon to notify you when stuff goes down and has a handy-dandy web interface for keeping track of everything, however sometimes you want more, because you’re greedy.

nagstamon
This helpful floaty-widget thing will keep you notified of any changes to the status of your Nagios-monitored devices, lets you acknowledge & recheck services and supports multiple Nagios instances.
nagstamon
cnagios
cnagios is a full-screen terminal interface for viewing Nagios host and service objects and the durations of their current states, using ncurses. It’s very handy if you’re limited to SSH connectivity to your Nagios server. The site it’s hosted on appears to be down at the moment, so hopefully the author won’t mind me hosting a copy of it for the time being: cnagios-0.29.tar.gz
cnagios
nagroid
Nagroid is an unofficial nagios client for android devices, which is handy because the standard web interface doesn’t play nicely with smartphone-sized screens.
nagroid
Nagios Checker
Nagios Checker is a Firefox addon that presents a status bar display of the status of your Nagios-monitored devices, similar to nagstamon. It’s very handy if you spend most of your day stuck in a web browser.
Nagios Checker

So there you go; now you’ve got no excuse for missing that service or server down alert.



A commonly reported issue with NSClient++ is seeing the following error when trying to install the service or run the client:

“The image file c:\[Path]\nsclient++.exe is valid, but is for a machine type other than the current machine”

The cause of this issue is attempting to run the incorrect version of the client for your architecture i.e. Trying to run the 64-bit version on a 32-bit operating system. The solution, obviously, is to run the correct version.



This one is nice and simple and allows you to monitor the health of your Exchange 2010 DAG via Nagios. For this, you will need:

Configure Nagios
Make sure you’ve got the Check_NRPE plugin in your libexec folder then add a new command definition to the commands.cfg like so:

define command{
	command_name    check_exrep
	command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -u -t 120 -p 5666 -c check_exch
	}

Then setup service definitions and hosts/hostgroups as you would normally.

Configure NSClient++
In your [NSClient++ Folder]\Scripts folder, create a new powershell script file (.ps1) called “exrep.ps1” and put the following code inside (Replacing the two sections in [] to match your environment):

Add-PSSnapin Microsoft.Exchange.Management.PowerShell.E2010
 
$Status = Get-MailboxDatabaseCopyStatus -server [Servername]
 
$flag = 0
 
foreach($State in $Status){
 
	if(($state.status -eq "Mounted") -or ($state.status -eq "Healthy")){
		$content = $($state.name)+": "+$($state.status)
		$output += $content+" - "
	}else{
		$content = $($state.name)+": "+$($state.status)
		$output += $content+" - "
		$flag = 1
	}
 
}
 
$output = $output.trimend(" - ")
$output = $output.replace("\[Servername]","")
 
write-host $output
 
if($flag -eq 0){
	exit 0
}else{
	exit 2
}

Next, open up your NSC.ini file and uncomment the “CheckExternalScripts.dll” line. In the [External Scripts] section, create a new entry for “check_exch” like this:

check_exch=cmd /c echo scripts\exrep.ps1 | powershell.exe -noprofile -nologo -command -

Note the trailing “-” which tells Powershell to read the -command value from stdin.

Finally, restart the NSClient++ service on the client machine and restart Nagios on the server. When your check next runs, if any of your storage groups are not in a Healthy or Mounted state your should get an output that looks like: [Database Name]: [Status]. Repeat the above for each server that is part of the DAG.



Nagios is a very powerful Linux-based Open Source server & network monitoring system, the Core version if free and the more recent “XI” version is their “Enterprise” offering with formal support and additional features. Whichever version you choose, a large part of its functionality comes from the addons and plugins written by 3rd parties for various software & hardware platforms, most of which can be found on the Nagios Exchange website.

For example, if you wanted to monitor performance counters, eventlogs, services and the like on Windows clients, then you need a Windows-based Nagios agent, such as NSClient++, which is the replacement for NSClient & NPRE_NT.

If, on the other hand, you wanted to run Nagios itself on Windows then you’re quite mad, but you can still do it with a package like NagWin, which wrappers up all the bits Nagios needs in Cygwin with Blat for sending email notifications.

Once you’re up and running, you might want some improved graphing of your monitored hosts; the built-in trending only covers the service states (OK, Warning, Critical) rather than the actual values, such as you might get with Disk Space or CPU Utilization monitoring. There are several packages available to do this, but my personal choice is Nagiosgraph; it’s very easy to setup, doesn’t need a heavy-duty database (It uses rrd for storing the graphing information) and is simple to retroactively add to your existing configuration.

And if you’re using Nagiosgraph, why not make the most of it, by including relevant graphs in your email notifications? Amongst other things, these extremely useful scripts from Frank4DD will nicely format your emails, add company logos, links to the relevant hosts and services within Nagios and, where applicable, graphs from Nagiosgraph showing the last 24 hours of activity for the subject of the notification. Highly recommended.

Finally, for the moment, one of the trickier systems I’ve found to monitor have been Netapp filers; they expose everything you could possibly imagine via SNMP, but not in a way that’s easy to interrogate for, say, free/used space on a single volume. Initially, I tracked down a promising looking addon called check_netapp_du but it had a tiny problem; older ONTAP versions only exposed Signed 32-bit SNMP counters for disk space and reported the values in bytes. Some of you may have worked out the problem here, Signed 32-bit values only give you 2,147,483,648 bytes to work with, which is 2Tb, so if your volume is more than 2Tb, you get some decidedly odd results returned. Thankfully, newer versions of ONTAP also have 64-bit SNMP counters, which give you, well, lots of bytes to work with.

The following modified script checks these counters instead of the 32-bit ones and has altered file paths for grep, awk, etc. on Debian, but otherwise behaves identically the original script: check_netapp-du. The relevant 64-bit OIDs are .1.3.6.1.4.1.789.1.5.4.1.29 for the disk total (dfBT) and .1.3.6.1.4.1.789.1.5.4.1.30 for the disk used (dfBU) values.

I’m currently working on a number of scripts for monitoring Exchange 2010 servers and DAGs with Nagios, but they’re still very much in Beta, so I’ll save them for another day.



Update: This turned out to be a Nagios-related powershell script running against Exchange that was being launched by a service running as LocalSystem, which didn’t have permissions to perform various tasks within Exchange. As soon as we stopped running the script the errors went away. Still no idea why the errors were popping up on servers in the Org that weren’t referenced by the task, but that’s Exchange for you.

Right, I’m throwing this out on the tiny off-chance that anyone has come across it and knows of a solution, because so far, Microsoft support haven’t and don’t.

Frequent entries in the Application logs of all Exchange 2010 Servers as follows:

(Process w3wp.exe, PID <PID>) “RBAC authorization returns Access Denied for user <Mailbox Server Computer Account>. Reason: No role assignments associated with the specified user were found on Domain Controller <Domain Controller FQDN>”

Several things.

1) Everything in <> has obviously been changed by me to remove details of my internal infrastructure, the actual errors contain real PID, account and server values. In all cases, the computer account is that of the Mailbox server, even though the error shows up on Mailbox, CAS and UM servers.

2) This is not, I repeat, not the same issue as you’ll find all over Google with a very similar error message that features a user account rather than a computer account. That one is usually caused by people not setting up permissions for their administrators properly in the ECP or broken permissions inheritance on accounts.

3) This error has survived a complete rebuild (OS and Exchange) of the Mailbox server, a re-running of the domain/forest prep tools and a couple of weeks examination by Microsoft Support. We’re currently looking at rebuilding all the other 2010 servers to see if it survives that too.

Any suggestions will be gratefully accepted.