Monitoring Replication With SQL Sentry

Like many DBAs, I've had to deal with replication in a few different environments. While it usually ran pretty smoothly, when things went badlyit was better if I figured that out sooner rather than later. Especially in the case of using replication for HA/DR, you need to know that replication is working and is up to date.

Within SentryOne SQL Sentry, while there isn't a replication monitor like there is in SQL Server Management Studio, you can still be alerted when something in your replicated environment isn't working as planned.

Monitoring Agent Jobs

The first step is to create general alerts on all of the replication based jobs that are currently enabled. Here are the jobs in my environment—and yes, I'm having issues at the moment.

The replication jobs in my environment - and yes, I'm having replication problems

You'll want to create alerts on failures and it might be beneficial to create alerts on retry for the replication agent jobs (the last three above).

You may already have alerts set up for all job failures, but you can also navigate to these jobs in the SQL Sentry navigation pane. If you select a specific job, the applicable alerts will appear.

job alerts

Using Advisory Conditions

Setting up alerts on these job failures gets you part of the way there, but Advisory Conditions can help alert you to additional potential issues in your environment.

Advisory Conditions can be used to create alerts around the values from a T-SQL query, WMI query, performance counter value, or any combination of these.

Here is a link to a zip file with some Advisory Conditions that have been created to monitor replication. Once the file has been unzipped, you can import any or all of these into your environment by opening the conditions list (expand Advisory Conditions in the image above) and with a right click, choose Import. The Advisory Conditions include the following:

Replication is Installed—This should be fairly obvious but could be helpful in the global alerts to let you know if someone has set up replication on one of your servers.

Distributor offline—Also fairly obvious, but any of the databases in your replication scenario go offline, replication will as well.

Distribution Delivery Latency/Transactional Log Reader LatencyBoth of these conditions first check to see if replication is installed (all of our replication conditions do) and, if so, query the Dist:Delivery Latency and Logreader:Delivery Latency performance counters. The min and max settings are set to 5 and 10,000, but those can be adjusted for your environment. The distribution delivery latency measures period of time from when commands are delivered to the distribution database to when they are applied at the subscriber. The Log Reader latency is between commands applied at the publisher to when they are delivered to the distribution database.

Distribution Delivery Cmd/sec/Transactional Log Reader Cmd/secLooking at commands per second can help identify increases in replication traffic or when fewer commands than expected are being delivered indicating some kind of conflict. As with the Advisory Condition above, you can set the min and max values based on your environment.

If there are other performance counters you would like to monitor, you can use the following code, just replace the counter name, to create additional conditions. As a note, this can be used for alerting on any performance counters.

DECLARE 
	@MinThreshold INT,
	@MaxThreshold INT

SELECT 
	@MinThreshold = 5, 
	@MaxThreshold = 10000

SELECT 
	object_name, 
	counter_name, 
	instance_name, 
	cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name like '%:Replication Dist.%'
AND counter_name = 'Logreader:Delivered Cmds/sec'
AND cntr_value NOT BETWEEN @MinThreshold AND @MaxThreshold;

Within Advisory Conditions, you can also alert on the results of a query, so you might want to query MSSubscriptions for any subscriptions with a status of 0 (inactive).

Additional Information

For more information on Advisory Conditions and how to create them, you can all about them in a set of blog posts. You can also join the on-demand or live Virtual Classrooms for more in-depth learning.

Thwack - Symbolize TM, R, and C