SQL Server Alert Tuning Basics with SentryOne
Alert systems should notify you of events or states that are meaningful or actionable. However, far too often I find that alerts aren’t tuned, resulting in hundreds or even thousands of email notifications being filed away in an email folder to act as a historical log of what happened at a certain point in time.
The good news is that SentryOne offers plenty of capabilities that allow you to configure alerts to be relevant and meaningful, which is a huge step toward proactively managing SQL Server performance.
I could probably write a book on the importance of alert tuning, but today, I want to focus on some of the basic alert tuning features that you can leverage in SentryOne. In a future blog post, I will address common alert use cases I run into frequently with other SentryOne users.
To set up effective alerts, you must understand that you can leverage the Topology you create in the Navigator pane to override and modify conditions and settings at various levels. The level or object you select in the Navigator pane defines the scope for the conditions shown in the Conditions and Settings panes. The screenshot below shows the selected level indicated at the top of the Conditions pane.
Defining the Scope of Conditions Shown in the Conditions Pane by Selecting a Group of Targets in the Navigator Pane
The level you select can be based on environment, application, workload type, or any combination of these and other classifications. You can even drill down to specific jobs, SSRS reports, etc.
For example, maybe you don’t want to receive alerts related to activity on your Development targets. Or, perhaps you want to be alerted on blocking at 15 seconds for your OLTP targets, but one minute for your Reporting targets. You can address these and other situations by organizing the Topology in a way that supports your alerting needs.
Although leveraging the Topology is quite powerful, it might not cover every situation. What if our logical groupings have inherently separated like objects? Take the following situation as an example. We have a Development group and a Production group. Under each group, there are OLTP and Reporting targets. If I want to make a broad change to conditions or settings for all my OLTP or Reporting servers, I might think I am out of luck because I have segregated my targets by environment levels. Well, have no fear, Object Groups are here!
Example of the Topology Grouping Inherently Separating Like Objects
Object Groups allow you to group monitored objects within SentryOne above and beyond the traditional Topology. In our example situation, I can put my OLTP and Reporting targets into an Object Group and make a modification or explicitly define conditions and settings at the Object Group level.
Example of Object Groups
Keep in mind that because Object Groups are outside the Topology, you don’t override conditions at the Object Group level. You can explicitly add and modify conditions and/or settings as needed at the Object Group level. These conditions and settings are honored above all else, including global conditions and settings, for the related objects.
It’s important to make a quick note about condition/action identity. Each condition/action object is independent of other conditions/action objects. Changes made at the condition/action object level aren’t applied to or inherited by other conditions/action objects, even if they have the same condition and action. The ability to have multiple versions of the same condition/action object can be leveraged to treat subsets of objects differently through conditional logic. (I will cover how you can leverage this functionality in my future blog post about alerting use cases.)
Alert Modification Options
When you select a particular condition/action object in the Conditions pane, several tabs appear on the bottom half of the pane. These options can be used to make modifications to how and when conditions/actions trigger.
As the name implies, the Action Settings tab allows you to configure settings that are specific to actions. The options shown differ based on the action. For example, for Send Email or Send Page actions, the Action Settings tab provides recipient options for you to choose from. You can add or substitute alert recipients (e.g., setting developers as the recipients for alerts about development server behavior) or include a ticketing system and/or on-call group for critical alerts.
Two options that tend to get overlooked, but can have great use cases, are the Importance level and non-default From Address options. When configuring SMTP settings in SentryOne, you define a default From Address that all SentryOne alerts “come from.” The default From Address can be changed at the condition level, which is helpful when creating email rules to send alerts based on importance or scope to specific folders, making it easier to manage and organize alerts. Leveraging a different From Address can also allow third-party ticketing systems to treat alerts differently.
Note that setting the Importance level to Critical for relevant conditions can flag the alert in your email to ensure you don’t miss a critical alert.
Action Settings Tab
The Condition Settings tab can be used to filter for or filter out specific events, objects, etc. from being alerted on. You have several conditional logic options (e.g., And, Or, Groups) to choose from, as well as a variety of parameters to choose from based on the condition’s scope (e.g., Category for job-related conditions).
Because the Condition Settings tab is effectively a filter, it’s understandable that you might think of condition settings as a “blacklist,” meaning you would try to use “Equals” or “Contains” logic to filter something out. However, given that this is conditional logic, it actually works the opposite way. If you want to filter for something, you need to use “Equal” or “Contains” logic. If you want to filter something out, you need to use “Does not Equal” or “Does not Contain” logic.
If this messes with your brain, you’re not alone. I minored in Cognitive Science, which required logic- and conditional-based courses (shout out to anyone who stuck with linguistics), and I still have to use a little trick to remember this logic sometimes. I have dubbed it the “as long as” trick. Take, for example, the [Category] Does not contain Maintenance condition filter shown in the screenshot below. If it’s related to a Job Failure condition, I would say something like, “Alert on every job that fails, as long as a jobs category does not contain Maintenance.” This logic would result in filtering out jobs with the string “Maintenance” in their defined category so that they wouldn’t be alerted on when they fail.
Condition Settings Tab
Rulesets are a way to implement additional buffer logic to delay the execution of an action until that logic is met. There are Count Based and Time Based rulesets. Let’s say, for example, that you have a server tied to a third-party vendor application, and this application is prone to occasional deadlocks. Now, there isn’t much you can do to resolve these deadlocks, given that they're happening with a third-party application. (Also, alerts on the occasional deadlock event aren’t particularly helpful and just add noise.) However, you don’t want to disable deadlock alerting altogether because occasionally you get snowball blocking that can quickly cause deadlocks in volume.
In this case, you can implement a Count Based ruleset that defines logic so that you’re only alerted if there have been 10 deadlocks within 5 minutes (or whatever X and Y values make the most sense for your environment). Doing so reduces the noise caused by a few daily outliers, and you will receive an alert only when a real issue occurs.
Count Based Ruleset
There are also Time Based rulesets. You can use these rulesets for more “stateful” types of events and define logic that states “only alert me if this condition has been consistently true for X amount of time.”
You can even define Subsequent Actions. For the Send Email action, these are essentially reminder emails, which can be helpful even if you set Process Actions After to zero so that you’re alerted immediately.
For example, out of the box, SentryOne applies the Block Notification ruleset to the Blocking SQL condition. This ruleset defines behavior that alerts you as soon as SentryOne collects a blocking event (at 15 seconds by default) and continues to send another email about the same event every 5 minutes, for as long as the blocking event continues to be active.
Time Based Ruleset
It’s common for there to be a bit of alerting noise during maintenance hours. Although there are options available that limit or disable alerting during certain periods, users often don’t want to go to that extreme. There might be a few alerts on events, such as job failures or blocking, that might occur more frequently during a maintenance period.
If these events aren’t really problems and are even expected to some degree, then there is no point in having to spend the first few minutes of your day reviewing and deleting meaningless alert emails. To address this issue, you can set a Window that defines logic to not alert on a condition during a specified time range.
You can also set up Windows at the Contact or Contact Group level. The most common use case for Windows at these levels is for on-call situations.
If you have ever been woken up in the middle of the night due to a Send Page (text message) alert on your phone, and you weren’t even on-call, you certainly aren’t alone. You can address this alerting issue by separately defining a general Contact Group for business day alert recipients and a separate on-call Contact Group. Then, at each Contact Group level, you can define the time range in which that group should receive alerts. It’s easy to move recipients in and out of the on-call Contact Group. Users can even be included in multiple Contact Groups at once if necessary.
Contact Group Alert Windows
Note that selecting Contacts or Contact Groups in the Navigator pane shows in the Conditions pane what conditions are tied to them.
SentryOne offers a wide variety of additional actions. Enabling automated response systems, treating critical alerts differently, integrating SentryOne alerts with other alerting/ticketing systems, and other possibilities provided by the wealth of action options available in SentryOne can also inherently improve the overall alerting landscape.
Let’s look at some key actions. First, there is Send Page, which enables you to receive a text message alert for your most critical conditions. Check out the blog post "Send SQL Sentry Alerts to your Phone" for more information about how to set up this action.
Then you have the Log to Database action. You can use this action if you want to log some condition occurrences that you can historically review, without worrying about more alert noise. I also often start with this action when testing new alerts (to prevent from spamming everyone before having a chance to tune alerts) so that I have some data to leverage for identifying alert tuning opportunities.
As for third-party integration, there are some additional options if you can’t email the application directly. You can leverage the Send SNMP Trap, Log to Windows Event Log, or Execute PowerShell actions.
SentryOne also offers automated response actions. In today’s world, automation is key to scalability. Using out-of-the-box conditions or custom Advisory Conditions, you can set the action to Execute SQL, Execute Job, Execute Process, or Execute PowerShell. In certain situations, these actions are extremely powerful because they can automatically resolve issues or log additional data that SentryOne might not collect out of the box.
The SentryOne team plans to deliver more options for customizing the format and information included in alert emails in the future. In the meantime, check out the blog post "Customizing Your Alert Emails in SentryOne" for details about how you can use the Execute SQL action to do so.
These are just a few of the alert tuning options available in SentryOne. As I mentioned previously, I’ll share common alert use cases that I encounter when working with SentryOne users in an upcoming blog post. In the meantime, be sure to check out my other SentryOne Tips and Tricks blog posts.
If you haven't given SentryOne a try yet to explore our alerting capabilities further, you can download a 14-day trial here.
Patrick is a Customer Success Engineering Manager and helps to provide customers with support services, troubleshooting, and defect resolution. Patrick also does some side work with SentryOne’s Professional Services team, who provide training and consulting. Patrick’s blog focus is on helping SQL Sentry users get the most out of our products. His hope is to ensure our users are as knowledgeable and comfortable as possible with SQL Sentry products, so they have as much ammo as possible to solve real world SQL Server problems.