SQL Sentry Tips and Tricks: Alert Mining
It’s critical to have meaningful, actionable alerts. I spend a lot of time helping SQL Sentry® users tweak alerts to position them in a more proactive stance rather than a reactive stance. However, many DBAs are overwhelmed with the numerous roles and responsibilities they must manage day to day. Alert tuning often becomes a random, intermittent activity where the noisiest problem during a particular day or week gets the most attention. This leads to a whack-a-mole management process that doesn’t address core issues.
I like to show SQL Sentry users how to data mine for the most frequent alerts triggering in their environment and then dig deeper into each of the top offenders in descending order. This allows for vast amounts of alerting noise to be addressed in a targeted manner.
In this blog post, I’ll walk through a simple demonstration of how to tackle alert mining. Be advised, the core message of this blog post is focused on how to surface the culprits behind alerting noise, not how to handle what bubbles up. For more insight into options for tuning alerts for the objects/events surfaced during the alert mining process, check out my blog posts, “Alert Tuning Basics” and “Common Alert Tuning Examples.”
Alert Mining Demonstration
First, look at which alerts are triggering most frequently across your monitored environment. You’ll find the main culprit of alerting noise is often job failures.
Example result set for the noisiest alerts
Next, dig deeper into which jobs are generating the alerting noise. In this example, we quickly see a subset of jobs generating the majority of the failure alerts. At this point, you’ll want to reflect on these jobs by asking yourself the following questions:
- What are they?
- What do they do?
- What does the failure volume indicate? Are these jobs expected to fail often, or could this indicate a resolvable issue with the jobs rather than a need to tune alerting for them?
- Is there value in being alerted to the job failures? If so, do you need to be alerted to every single occurrence?
Example result set of jobs causing the most failure alerts
Now, let’s look at the second highest count alert: deadlocks. At the bottom of the alert mining script included at the end of the blog post, there are queries capable of pulling deadlock counts by application, database, user, resource, or text data. For simplicity, we’ll just look by application in this example.
You’ll want to reflect on these alerts by asking yourself the following questions:
- Is there value in getting alerts for a particular deadlock?
- Are the deadlocks tied to a third-party application you have little to no control over or another process with low criticality? If so, does it make sense to be alerted on it?
- Do you need to be alerted to every occurrence of a particular deadlock, or only if it snowballs? (For example, you could receive alerts only when there have been X number of deadlocks within Y amount of time.)
Example result set of applications causing the most deadlocks
If blocking or runtime/duration max conditions are surfaced, check out the related examples in my “Common Alert Tuning Examples” blog post. It’s important to understand the threshold settings before thinking about other ways you might want to adjust alerting.
Again, the concept is simple. Identify the cause of most of these types of alerts, dig into the specific events or objects causing the majority of alerts, and reflect on those events or objects. Then, consider more meaningful ways of being alerted (if you need to be at all). I recommend doing this every 2 – 4 weeks initially. Once things get to a better place, I always recommend running through this script at least quarterly.
- By default, SentryOne keeps one year’s worth of alerting data, and the queries in the provided script look back at the past 90 days. As you go through subsequent tuning sessions, make sure you set the EventStartTime to a date after your final round of tuning. Otherwise, “old data” can skew the true impact of the tuning previously done. Because of this, I recommend adding a comment noting the date of the last time you went through the script so you know what to set the EventStartTime to the next time around.
- The general flow of this script is the lower you go down, the more granular it becomes.
- If you have an alert appear for which I haven’t provided a query, there’s a general template query at the bottom of the provided alert mining script where you can just update the ConditionTypeName.
- Advisory Conditions tied to the Start page have a different table schema than the “send email” actions used in this blog post. To learn more about digger deeper into Advisory Conditions, check out my blog post, “Tuning Advisory Conditions.”
I hope with this script and explanation of how to run through alert mining, you can now quickly and effectively tune your SQL Sentry alerts. Check out the section below for the alert mining script.
Download the Code Covered in This Blog Post
Additional SQL Sentry Tips and Tricks Blog Posts
- SentryOne® Top SQL Tips and Tricks
- SentryOne Performance Analysis Dashboard Tips and Tricks
- SentryOne Event Calendar Tips and Tricks
- SQL Server Alert Tuning Basics With SentryOne
- SQL Sentry Tips and Tricks: Adjusting Navigator Pane Highlighting
- SentryOne Default Startup Action: Performance Overview
- SentryOne Tips and Tricks: Common Alert Tuning Examples
- SentryOne Tips and Tricks: Tuning Advisory Conditions
- Analyzing Deadlocks in SQL Sentry
Patrick is a Customer Success Engineering Manager and helps to provide customers with support services, troubleshooting, and defect resolution. Patrick also does some side work with SentryOne’s Professional Services team, who provide training and consulting. Patrick’s blog focus is on helping SQL Sentry users get the most out of our products. His hope is to ensure our users are as knowledgeable and comfortable as possible with SQL Sentry products, so they have as much ammo as possible to solve real world SQL Server problems.