SQL Sentry v8: Intelligent Alerting Redefined
It's tempting to insert some clever reference to everyone's favorite vegetable juice medley here, but I will refrain, for fear of diminishing what is perhaps the most significant SQL Sentry release to date! In v8 we've addressed the three most frequent requests from our users – performance alerting, baselining, and cloud access – and we've done so in typical SQL Sentry fashion. In other words, we haven't been content to tick a feature box – we've gone all out to ensure that each feature has been constructed thoughtfully and thoroughly, and integrates seamlessly with the rest of our software, ultimately providing a superior user experience.
I normally review all major new features in a single post, but this one got so long I had to split it up. First up in the series on v8: Performance Alerting (aka, Custom Conditions).
Pre-v8 Performance Alerting
There are only a couple of feature areas where heretofore a few in our space have been able to legitimately claim superiority, the big one being performance alerting. You've been able to do it via our Event Manager product for some time, but it was not the most elegant of systems for general purpose performance alerting, since it was initially designed around performance related to jobs.
SQL Sentry v8 changes all of this – the new Performance Advisor delivers performance alerting in a big way, and much more. Yes, we've leapfrogged the competition once again. Leapfrog may not be the right term though, as that could imply that at some point they may jump back over us. Knowing what went into building this feature, IMO there is little chance of this happening. (Let's just say it's about as likely as them building a better tool for plan analysis.) ;-)
Enter Custom Conditions
The name custom conditions isn't glamorous... but it does accurately describe the whole feature in a way that advanced performance alerting, enhanced change detection, advisory rule builder, or similar flashier but narrower terms don't. You can do all of these things and more with custom conditions, but ultimately they are just that – a condition that you define on which you want to be alerted and/or take some action. A custom condition can:
- Detect and alert on threshold excursions for:
- Windows performance counters (including SQL Server, SSAS, etc., counters)
- Virtual performance counters (those auto-calc'd by SQL Sentry from DMVs or other sources)
- Detect and alert on changes to values from:
- SQL queries against user databases
- SQL queries against the SQLSentry database
- DMV/DMF queries
- WMI queries
- Compare multiple values from the same or different subsystems, using any combination of ANDs and ORs, and any number of nest levels
- Apply math (multiply, divide, add, subtract) to values from any subsystem, and use the results in comparison operations
- Combine multiple conditions together to create compound conditions
- Detect when any SQL query against a target exceeds a specific duration threshold
- Test any performance counter against a baseline value, or a percentage of a baseline value
- Fire any combination of 11 different actions (Send Email, Send SNMP Trap, Execute SQL, etc.)
- Show event markers on the Performance Advisor Dashboard charts, including enhanced tooltips with supplemental information
- Write to the new Events Log, where a historical record of the event with a snapshot of all associated metrics is maintained
- Easily be applied to all monitored servers, groups of servers, or individual servers
Whew! I know this sounds like a lot, but one of the best parts is how easy custom conditions are to configure. Everything is point and click, and even complex conditions can be configured quickly.
To get started, open the top-level Custom Conditions > Conditions List node in the Navigator pane:
When you open this node for the first time, you will be prompted to download a base set of conditions created by SQL Sentry. You can also do this any time from the Tools menu. These will give you a good starting point for monitoring, and can also serve as a reference when creating your own conditions.
First I'll cover a simple example of what you can do with custom conditions using one from the included pack: High Compiles. Here's how it looks in the condition designer – you can specify a name, rich text description, definition, and various other parameters which control when the condition is triggered (evaluates to true):
This definition contains two comparisons: the first ensures that batches/sec is >100 before even trying the second, which tests to see if compiles/sec is >15% of batches/sec. This is called short-circuiting, and it serves to both eliminate false positives, and minimize the amount of processing the custom condition engine must perform. We can see this in action here:
The above is the view presented in the Evaluation Status grid and the Events Log after a condition has been evaluated against a target, and it embeds the results for each comparison, expression, and operation. This info is invaluable for testing and troubleshooting.
Speaking of, you will typically want to test any new condition you create against monitored servers before assigning actions to it, to ensure you aren't bombarded with alerts due to inappropriate thresholds. To do this, simply click either Evaluate button in the upper right:
The condition selected in the Conditions grid on the left will immediately be queued for evaluation by the SQL Sentry Monitoring Service, and results will show up in the Evaluation Status grid on the right within a few seconds... even when testing against hundreds of servers!
How is this possible? Check out SQL Sentry dev lead Brooke Philpott's (b|t) excellent post on how we are leveraging new .NET 4.5 features to pull this off without causing thread starvation or other performance issues.
Next up is a more advanced condition which uses Jonathan Kehayias' adaptable Page Life Expectancy (PLE) formula:
This condition has performance counters on both sides of the comparison, along with some math to dynamically calculate PLE based on the buffer size. It also takes advantage of short-circuiting – the first comparison ensures that the buffer size is greater than 2GB (131,072 is the number of pages per GB) before testing the second comparison. You can see in the example above, it has scaled the standard PLE > 300 seconds rule up to 588 due to the larger buffer on this server.
Another very cool enhancement in use here is the Any instance. When Any is used, all instances for a counter are automatically tested independently – NUMA nodes 000, 001, 002, and 004 here – and they are automatically synchronized inside each comparison. In other words, pages from instance 001 are only compared with PLE from instance 001, and so on. This way you don't have to configure a bunch of identical conditions for each counter instance, one for each NUMA node in this case.
Creating Custom Conditions
To create your own, click the Create Custom Condition button at upper left. All conditions have access to Windows metrics, but you may want to access SQL Server or SSAS metrics as well, so select one of the sub-items as appropriate:
Next, select the data type of the first comparison (numeric, string, or date/time), then the value source type:
You have a variety of source types to choose from, and this is one of the aspects that makes this new feature so powerful. For the first time you can easily integrate any combination of values from any source to produce an intelligent "rule."
As you go through the dropdowns, the subsequent elements change based upon your prior selections, guiding you in the right direction:
In the shot above, the performance categories listed first (without the colon) are virtual counter categories exclusive to SQL Sentry. Just as in our performance reporting, we expose all data we collect. After all, why collect it if we aren't going to let you use it? A great example is wait stats. We eliminate innocuous waits and group them into friendly categories (Disk, CPU, etc.) and classes (Transaction Log, Parallelism, AlwaysOn, etc.), any of which you can now use for alerting purposes:
So in a few clicks you can have a condition which triggers if total disk waits goes over 500ms for more than 30 seconds... on any server. Further, response rulesets can easily be applied to control exactly when you are alerted.
Conditions can easily be embedded within other conditions. Here's an example of one that picks up high CPU, but only when a process other than SQL Server is involved:
It leverages the simple High CPU condition to short-circuit if CPU isn't greater than 90%. If it passes, it then it calculates the CPU % not associated with sqlservr.exe, and triggers only if it is greater than 25% of the total.
Whenever you save a condition, if no actions have been defined you'll be prompted to add them:
If Yes, you'll see the Actions Selector where you can choose from any of the 11 different actions which can be taken in response to the condition:
The new Send to Alerting Channels action (the default) allows you to control how/where the alerts are presented throughout the client app. The different channels can be individually selected from the Condition Settings tab once the action has been added. Just like all other conditions/actions in SQL Sentry, these can be set globally where they will apply to all servers in the environment, and easily overridden at the group or server level as needed.
Once enabled, all custom condition/action combinations show up in the newly redesigned Conditions pane at right, where alert settings, rulesets, and windows can easily be adjusted:
The initial alerting channel is for the Performance Advisor Dashboard (see below for how this works), and more are on the way.
When Send to Alerting Channels is selected, events will always be logged to the new Events Log:
This is the place to go to see status for all condition events, active or completed. The grid lists all events for the selected date range, but can be easily filtered via the column headers or pre-loaded filters. If you click on any row you will see the stateful condition definition at bottom, and a richly formatted description of the issue at right. Notes can easily be added, users assigned, severity changed, and events can be snoozed or closed from here. If you click an End Time cell you will see the metrics which caused the condition to evaluate to false again, thus ending the event.
Events on the Dashboard
From the Events Log you can quickly jump to the Performance Advisor Dashboard by right-clicking an event to see what was happening on the server at the time. You will see event markers on all associated dashboard charts:
In this case, the condition references both waits and disk latency counters, so the system recognizes this and overlays event markers on the appropriate charts, along with a warning glyph which, when clicked, shows the full details for the events in range. These event markers are on by default whenever using the dashboard, but can be toggled off using the toolbar.
To share your conditions, simply right-click any condition and select Export. A .condition file will be created which can be imported into any other SQL Sentry v8 environment. Soon you will have the ability to share your creations in a more integrated fashion via our new cloud portal, with your other SQL Sentry environments as well as the community at large.
We're Just Getting Started
With SQL Sentry v8 we've taken intelligent alerting to an entirely new level, and we've done so in a way that is simple for anyone to understand and use. We're going to continue to enhance this feature, and to release new conditions, so be sure to download them when prompted. I'm excited about the possibilities there, but even more excited to see the kinds of conditions that you are going to dream up!
Coming up I'll be covering the other major features in v8: Baselining and the SQL Sentry Cloud Portal. Stay tuned...
Greg (@SQLsensei) provides strategic leadership for SentryOne and is intimately involved with product design and development. Here he covers new features and how to use the software to optimize SQL Server performance. Whether you're an existing customer or evaluating the software, be sure to check out Greg's blog for powerful insights.