Categories


Authors

Automated monitoring of Hyper-V virtual machine replication using PowerShell

Backstory

A few months ago, my manager tasked me with setting up automated monitoring and alerting of replication for the virtual machines on our Microsoft Hyper-V Server 2012 R2 (core) hypervisor servers (a task that I was more than happy to accommodate considering I had been meaning to anyway and I love programming but, unfortunately, it's rarely a part of my job) and IM-ed me "http://luka.manojlovic.net/2015/03/05/simple-hyper-v-replica-warning-critical-state-notification/".

I appreciated the starting point but I despise alerting via email because it's a nightmare to manage.

Our Remote Monitoring and Management (RMM) system, MAX RemoteManagement, is integrated with our helpdesk / Professional Services Automation (PSA) system, Autotask. The main benefit of this is that if the RMM system detects an issue then in the PSA system it automatically creates a ticket containing the details, etc. Obviously, it made infinitely more sense to make use of this so I decided to look into it.

I found that MAX RemoteManagement does not natively support this task but it does support the usage of custom scripts. The relevant documentation states the following:
 

Which scripting languages are supported?

The following script types are supported where a handler is installed:
Windows: DOS Batch, JavaScript, Perl, PHP, PowerShell, Python, Ruby, VBS and CMD
Linux and OSX: Shell scripts and interpreted languages (Perl, PHP, Python, Ruby)

For a check to be reported as passed or failed, what return codes should be returned by the script?

The check will be reported as passed when the return code from the script is 0. All other return codes will cause the check to be reported as failed.
— https://dashboard.systemmonitor.co.uk/dashboard/helpcontents/index.html?faqs3.htm

So, this suggested that I should simply be able to use a PowerShell script to check the VM replication health and return the appropriate exit codes.

Obviously, I'm not going to waste my time creating a PowerShell script to accomplish a task if one already exists so I browsed the web looking for one that I could modify to our needs but I didn't find much and what I did find either wasn't suitable or I wasn't satisfied with. (Hence, this post).

So, I created my own.

Quickly, I got a crude-but-functional system working but over the next few months I noticed pitfalls and issues so I regularly refined and adapted it.

A few months later, I give you "Check Hyper-V VM replication health v6.ps1".

Code

# mythofechelon.co.uk

$ReplicatedVMs = Get-VMReplication;

If ($ReplicatedVMs.Length -GT 0) {
	# VMs with replication enabled found
	
	# Document VM replication health
	ForEach ($VM in $ReplicatedVMs) {
		If (($VM.ReplicationHealth -Eq "Warning") -Or ($VM.ReplicationHealth -Eq "Critical")) {
			# Current VM has replication issues
			
			# Add current broken VM's relevant objects to multidimensional array
			$ReplicatedVMs_Broken += ,@($VM.VMName, $VM.ReplicationHealth);
		}
		
		# Add current VM's relevant objects to multidimensional array
		$ReplicatedVMs_All += ,@($VM.VMName, $VM.ReplicationHealth);
	}
	
	# Generate output part 1/2 and MAX RemoteManagement exit codes
	If ($ReplicatedVMs_Broken.Length -Eq 0) {
		# VM replication issues not found
		
		Write-Host "SUCCESS. Found no VMs with replication issues. Details:";
			
		# Set exit code to pass MAX RemoteManagement check
		$ExitCode = 0;
	} Else {
		# VM replication issues found
		
		Write-Host "FAIL. Found VMs with replication issues. Details:";
		
		# Set exit code to fail MAX RemoteManagement check
		$ExitCode = 1;
	}
	
	# Generate output part 2/2
	For ($i = 0; $i -NE $ReplicatedVMs_All.Length; $i++) {
		$Output = 'VM "' + $ReplicatedVMs_All[$i][0] + '" replication health "' + $ReplicatedVMs_All[$i][1] + '"';
		
		Write-Host $Output;
	}
} Else {
	# Found no VMs with replication enabled
	
	Write-Host "Found no VMs with replication enabled.";
		
	# Set exit code to pass MAX RemoteManagement check
	$ExitCode = 0;
}

Exit $ExitCode;

Screenshots

The whole system when issues exist:

The whole system when no issues exist:

I hope this is helpful to someone. :)

Let's Encrypt, Emby Server, and Windows

The confusion that is Microsoft's cloud services