Elevate Performance across the Microsoft Data Platform

Fast resolution of performance issues. Actionable alerting, intuitive event management, and powerful performance analytics deliver fast and efficient database performance.

Get Started

"My only choice for SQL Server Performance Analysis"

Source: TrustRadius. The number one B2B software review site.

is now...

Learn More

SQL SERVER, SHAREPOINT & WINDOWS ALERTS

Introduction: A New Take on Alerting
Google AuthorTwitter: @keklineBlog: kekline.com

Alerting systems have been around for many decades now. In fact, SQL Sentry introduced its first alerting system, Event Manager, way back in 2004. Now, our new release of SQL Sentry v8 dramatically improves and enhances the capabilities of users who wish to be alerted to potential issues on their SQL Server and Windows Server infrastructure.

For much of the time since alerting systems were introduced, you could expect to follow a very simple template. You would simply define the metric or condition that the alerting system observes, and the response or action that the system should perform when a threshold on one of the metrics has been exceeded (sometimes called a threshold or trigger).

The promise of alerting is that it makes it possible for the right people to keep up with the right information that matters most to them.

Alert responses could include the execution of a program or the notification to a user in one or more delivery channels such as e-mail, Short Message Service (SMS), instant messaging (IM), via a voice portal (known as “robocalls”), desktop alerts and more. The promise of alerting is that it makes it possible for people to keep up with the information that matters most to them. Unfortunately, most alerting products are just too simplistic.

In this white paper, we will use the basic alerting system included with SQL Server Management Studio (SSMS) to illustrate and compare how the majority of alerting systems behave and how SQL Sentry's new alerting system, introduced in v8, is far superior. After all, these are the very reasons an opportunity for our tool existed.

In spite of how it sounds, this is not meant to knock Microsoft, the SQL Server team, or SSMS. The fact that SSMS even includes features like this is actually a feather in Microsoft's cap when comparing SQL Server to other database platforms, which charge for a separate product for the same or similar features. In this case, it is just the reality that this set of features has been largely ignored and experienced no innovation since the introduction of SQL Server 2000.

Table of Contents

Common Failures in SQL Server, SharePoint & Windows Alerting

Alerting systems fail in two common ways. First, the alerting system may generate so many uninformative alerts that staff begins to ignore the alerts, possibly causing them to ignore real and significant issues. Second, the alerting system may fail to raise the alarm on issues that staff wanted to be notified about. Often, the alerting system has both shortcomings simultaneously.

Many alerting systems are unable to ensure that the alerts they send are appropriate and noise-free.

Whether we are discussing SQL Server alerts, Windows Server alerts, SSAS alerts, or even SharePoint alerts, the goal for alerting is the same. Alerts should be both appropriate (catching all conditions that need attention within the context of a true problem scenario) and noise-free (by alerting only on the conditions that truly need attention). To achieve this, you should be continually tuning away non-meaningful server alerts. An equally important activity when resolving a problem or major system issue is to create new server alerts to warn of any future recurrences of the same problem or incident.

Two related, but lesser, failures of many alerting systems are routing (ensuring that the right people receive the server alert in the most appropriate channel for the context of the alert) and scope (sending only the type of server alerts which are appropriate to the severity of the condition).

The following sections will show you how to deal with the majority of problems arising from the common failures of alerting.

Choosing the Appropriate Response for an Alert

Alerts are important. Otherwise, you would not want to be notified when they are triggered. On the other hand, all alerts are not created equal, and do not require an equal response. Some are the equivalent of a "five alarm fire" for firemen, while others should merely be recorded for determining trends or should raise a red flag indicating that a low-level problem needs further investigation.

All alerts are not created equal, and do not require an equal response.

However, the native alerting capabilities within SQL Server and most commercial third-party alerting tools are very limited in the ways in which they allow users to respond. The following example, based on SQL Server's native alerting feature, shows how SSMS allows you to handle an alert response:

SSMS Alerting

Figure 1: SSMS New Alert Window

Notice that in SSMS, shown above in Figure 1, you are only able to respond in one of two ways:

  1. Execute a SQL Server Agent Job
  2. Notify one or more Operators via e-mail, pager (SMS), and/or net send (the Windows command to send a pop-up notification directly to a desktop client)

This feature introduces just two means of responding to an alert, both with plenty of inherent limitations. The first option, a SQL Server Agent Job, provides a means of programmatic response. But it must be encapsulated in a SQL Server Agent Job, with a great many limitations of its own. For example, it is difficult to use SQL Server Agent Jobs for issues outside of SQL Server without lots of custom T-SQL programming, using the xp_cmdshell system stored procedure, even more so for Windows Server or SQL Server Analysis Services (SSAS), and often difficult for staff to automate. Furthermore, any additional work needed by the alert must be custom coded by the DBA, and may require opening big holes in system security because of the need to use functionality such as the aforementioned xp_cmdshell.

Since many alerts are important, but not critical, it is likely that the available routing options for an alarm (email, pager, and net send) are insufficient. For example, many IT teams like to keep a log of non-critical alarms, while having only a small handful of critical alarms which actually page them in the middle of the night and interrupt a good night's rest. It is possible to meet that requirement using SSMS, but it is a major custom programming project in its own right.

In response, SQL Sentry v8 enables a much wider variety of response actions to an alert:

SQL Sentry Actions

Figure 2: SQL Sentry Actions Selector: Custom Conditions Dialog

SQL Sentry provides a variety of optional methods of response whenever an alarm is raised, including the ability to automatically collect deeper troubleshooting information (Run QuickTrace), invoke an SSAS job, send an SNMP Trap to another Application Performance Management system (APM) such as Operations Manager or Tivoli, and much more. Additionally, SQL Sentry allows you to use any combination of the response types and, combined with features such as alert filtering, gain even more control over alert routing and alert scope. Our custom conditions allow the additional ability to send those alerts to our Performance Advisor dashboards, so you can see how the behavior that fired the alarm also affected performance.

Limited Conditions for Alerts

When considering the features of most alerting systems, including the native alerting system in SQL Server Agent, a common shortcoming is that alerts are limited to only those performance conditions found within SQL Server's relational engine and the Windows Server itself. With SQL Server's alerts, the alert system can also evaluate conditions on Windows Servers using WMI (Windows Management Instrumentation), which requires a great deal of knowledge about WMI, or SQL Server performance conditions (based upon the metrics derived from the sys.dm_os_performance_counters DMV).

A common shortcoming is that alerts are limited to only those performance conditions found within SQL Server's relational engine and the Windows Server itself.

Figure 3, shown below, illustrates the capabilities exposed by SSMS for alert conditions:

SSMS New Alert Dropdown

Figure 3: SSMS New Alert Type Drop-down List

These limitations are considerable. While you could raise Windows alerts using this alerting system, that is the only element of the Windows Server stack aside from SQL Server that is accessible. For example, you would not be able to alert on conditions within SSAS or SSIS with this alerting system. This is one of many issues.

Performance Advisor Alert Conditions

In contrast, SQL Sentry Performance Advisor allows alerting on any form of condition within Windows, SQL Server, SSAS, and just about any other area within the Windows stack as you can imagine, as shown below in Figure 4:

SQL Sentry Condition List

Figure 4: SQL Sentry General Conditions List

In addition, SQL Sentry allows a wide variety of more specific source types, enabling especially powerful custom alerts, which join any combination of sources and conditions, as shown in Figure 5 below:

Custom Condition Source Dropdown

Figure 5: SQL Sentry Custom Condition Source Type Drop-down List

By providing a much richer combination of sources for your alert conditions, you are able to ensure that SQL Sentry can monitor and trigger responses on a wider variety of system issues than other alerting systems are able to do. For example, you can create an alert which measures the proportion of CPU being used by three very different sources (SQL Server, SSAS, and Windows Server), where this is extremely difficult or impossible using other alerting systems.

Finally, it is also worth pointing out the value of the Repository Query source type in Performance Advisor. This source type enables you to construct an alert on any value stored in the SQL Sentry repository. Using a Repository Query allows you to add a new alert with zero added monitoring overhead, because the monitoring and collection of metrics has already been done. You are merely alerting on information that is already collected and tracked. Compare this behavior to other alerting systems where every new metric or query added means additional monitoring overhead.

In case the implications are not immediately obvious, SQL Sentry allows you to combine not only a wide variety of sources and source types, but you can also do so using a multitude of conditional combinations. This conditional processing of alert conditions makes this new feature especially powerful. SQL Sentry allows for IF-THEN-ELSE alerting in a way that other alerting systems simply cannot. For the first time, you can easily integrate any combination of values from any source to produce a truly intelligent "rule." We will discuss SQL Sentry Performance Advisor custom condition features in greater detail in an upcoming section, Noisy Alerts.

Event Manager Alert Conditions

In addition to the new alerting features introduced in Performance Advisor v8, SQL Sentry Event Manager has long held the leading position for alerting on conditions throughout the Windows stack, including SQL Server and SQL Server Agent, SSAS, SSRS, SSIS, SharePoint, and Windows. As shown in Figure 6, the type of conditions and actions available through Event Manager cover the gamut of SQL Server and Windows activity:

General Actions Selector Dialog Box

Figure 6: SQL Sentry Actions Selector: General Conditions Dialog

In the example shown in Figure 6, we have chosen to execute a SQL batch whenever a SQL Server Reporting Services exceeds its runtime maximum threshold. This is a unique capability not found in other alerting systems, since other systems only alert on SQL Server and possibly Windows conditions.

Event Manager Event Chaining

In addition to alerting on all parts of the Windows stack, Event Manager includes the capability to construct and administrate event chains. An event chain is a series of dependencies between events running on the same server or different servers. There is no limit to the length of a chain, or the number of events that can be defined for a chain. Event Manager also prevents the creation of circular chain references, so that your event chains will not get caught in a dangerous loop. Figure 7 shows the series of job steps, or “events”, which must occur on multiple servers to complete the entire Process Order History event chain:

Event Manager Event Chaining

Figure 7: Event Manager Event Chaining Editor

At each discrete step in the Process Order History event chain, specific conditions and actions can be specified. That way, the DBA or business process owner is able to ensure that the entire event chain completes and, if not, activates the appropriate action in response to the alert. For example, if the Update Measures step fails, it might retry that step a few minutes later, and if that fails, send an email to the administrator.

Noisy Alerts

A fundamental rule of effective alerting systems is that they should only notify you of conditions that truly matter. If the alerting system overwhelms you with unnecessary or unimportant alerts, they become aggravating noise. When users have constantly noisy alerts, in time, they are likely to ignore or miss an important alert somewhere in the deluge of unimportant notifications.

Truly effective alerting systems respond only to conditions that truly matter.

SQL Sentry Performance Advisor v8 offers two features to combat noisy alerts: Custom Conditions and Baseline-centric Conditions.

Custom Conditions

One of the biggest reasons that alerting systems are so noisy is that they can only trigger an alert based on a single threshold. For example, you might create a Windows alert to notify when your SQL Servers use more than 90% of the available CPU for a specified duration. But that is a simplistic measurement and may be misleading in that the SQL Server is under any form of stress. It is not uncommon for a SQL Server to show distress only when multiple metrics pass certain thresholds.

Conditions can easily be nested within other conditions. Here is an example of one that detects high CPU, but only when a process other than SQL Server is responsible for a sizable amount of CPU utilization:

SQL Sentry Custom Conditions

Figure 8: SQL Sentry Custom Condition Builder

In our example, we do not just leverage the simple High CPU condition to short-circuit if CPU is not greater than 90%; if that threshold is passed, the condition then also calculates the percentage of CPU not associated with sqlservr.exe, and triggers only if non-SQL Server CPU utilization is greater than 25% of the total.

In comparison, other alerting systems trigger a response when a single condition is met or hardcoded threshold is exceeded.

As previously mentioned, SQL Sentry v8 provides you with more choices when responding to your alerts. This feature makes it possible to keep track of problematic situations without having to rely on emails to know what is happening in your environment. But an even better option than offering more ways to respond to an alert is providing more features to reduce the overall number of alerts to only those that are truly informative and actionable.

Baseline-centric Conditions

SQL Sentry v8 introduces powerful new baselining features that are integrated with our alert conditions within the Performance Advisor monitoring system. These baseline measurements enable DBAs to tell what is normal for their SQL Servers. Baselines are certainly useful and informative, but are also very difficult to create by hand for most DBAs.

The integration of baselines with alert conditions takes alerting to a whole new level of context awareness and value by truly understanding what is normal performance for the server.

Performance Advisor v8 not only makes it easy to automate the collection, processing, and calculation of baselines, but it also makes those baselines actionable by integrating them directly into our alerting engine. Baselines clearly enable a DBA to tell when a condition is normal or abnormal, meaning that they can set alerts based not on a specific numeric threshold, such as a Windows alert of 90% CPU busy, but instead on the baseline value (i.e. what is the normal CPU utilization for a server under a particular workload?). They can also apply calculations to the baseline value, as shown in Figure 9:

Custom Condition Builder Using Baseline Values

Figure 9: SQL Sentry Custom Condition Builder Using Baseline Values

In this example, the baseline percentage of processor time is now based upon the baseline called normal day. The DBA is now able to create a custom condition which triggers the alert when the CPU exceeds what is normal by 1.5 times.

The great thing about this feature is that your alerts and custom conditions can be cloned (or scripted) and then rolled out to other servers. Performance Advisor enables you to create as many baselines as you like. So you could also create a separate baseline and set of alerts for as many scenarios as are appropriate to your environment; for example, baselines called busy day, month-end processing, and preventative maintenance processing. That way, they may have an entirely different threshold for a "normal day," but the alert always takes that into account when determining when to respond.

Analyzing Alert Cause & Effect

A common problem of alerting systems is that although they notify you of the occurrence of a specific condition (say, that CPU usage has exceeded 80%), they do not actually tell you why they exceeded that threshold, nor what other effects that condition may have had (for example, whether the system slowed down at all after seeing the CPU get that busy). If you want to know that information using the native SQL Server Agent alerts, you will need to buy or build a collection system of your own, since you have no way of performing research into past performance issues using the native tools.

By comparison, SQL Sentry directly facilitates troubleshooting of the cause and effect of an alert with the alert notification itself.

Performance Advisor, on the other hand, provides deep diagnostic information and historic metric collection. To get a quick summary of all of the event activity on a monitored SQL Server, check the Events Log, as shown below in Figure 10:

Performance Advisor Events Log

Figure 10: Performance Advisor Events Log

The Events Log shows the status for all condition events, active or completed. This includes SQL Server alerts, SSAS alerts, and Windows alerts. The grid lists all events for the selected time range, but can be easily filtered via the column headers or pre-loaded filters. If you click on any row you will see the fully-defined condition at the bottom and a detailed description of the issue at the bottom right. For the alert, notes can easily be added, users assigned to respond, severity changed, or snoozed or closed from here. If you click the End Time cell for a specific event, you will see the metrics which caused the condition(s) which triggered the event, thus ending the alert.

SQL Sentry Performance Advisor then makes this problem very easy to solve. From any of the listed alerts, you can quickly jump to the correlated time of the event in the Performance Advisor Dashboard to see exactly what was happening before, during, and after an alert by right-clicking the event and choosing Jump To... Dashboard. You will then see event markers on all associated dashboard charts, as shown below:

Performance Advisor Dashboard

Figure 11: Event Correlation on the Performance Advisor Dashboard

The Performance Advisor Dashboard uses the small markers at the bottom of each graph to show the time and duration of the alert condition. In this case, the alert condition references both waits and disk latency counters, so the system recognizes this and overlays event markers on two appropriate charts, along with a warning glyph which, when clicked, shows the full details for the events in range. And, as has been the case with Performance Advisor for many releases, you can also drill-down on the Performance Advisor Dashboard to see greater detail, such as the Top SQL Statements occurring at that time. These event markers are on by default whenever using the dashboard, but can be toggled off using the toolbar.

SQL Sentry provides two other alternatives for automating the collection of addition information whenever an alert condition occurs.

Event Manager Calendar View

First, Event Manager provides visual, Outlook-style alert tracking so you can tell exactly what failed, when, and what other activities were affected, as shown below in Figure 12. Its visual display combines tracking for scheduled tasks, such as SQL Server Agent Jobs, SharePoint Timer Jobs, and Windows Tasks, along with SQL Server performance events, such as blocking, deadlocks, and Top SQL statements. This provides added visual details for troubleshooting in the event that multiple failures occur at the same time.

By comparison, other alerting products make it impossible to determine the true chain of events when multiple alert conditions are detected simultaneously.
Event Manager Calendar View

Figure 12: SQL Sentry Event Manager Calendar View

This view enables you to quickly see overlapping events, long-running events, and failed events all in a single intuitive and easy interface. (This can be especially useful for SharePoint alerts, since SharePoint is built upon not only a SharePoint Server, but also a Windows Server and SQL Server. Seeing all three server events at once would otherwise be quite difficult.) There are a number of glyphs provided so you can see specific types of events at a glance. Your event views are also fully configurable and can be saved for later, repeated use, such as all of your backups, all of the Top SQL events, or all the failed SQL Server Agent events.

Quick Trace

Second, one of the built-in Action choices when an alert is triggered is to Run a Quick Trace. For a given high-impact alert, it may be especially helpful to collect a lot more system information than normal. Quick Trace gives you an instant snapshot of the activity performed by all of the processes on a SQL Server at the time. When invoked, a Quick Trace collects server-side trace and process-level data for a specified interval and then automatically correlates the data, effectively combining the functionality of Profiler and Activity Monitor together with a single click.

By comparison, other alerting systems continue collecting the same information they always do – before, during, and after an alert has been raised.
SQL Sentry Quick Trace

Figure 13: SQL Sentry Quick Trace Tab

As shown in Figure 13 above, a wide variety of metrics such as CPU, I/O, recompiles, cache misses, cursor operations, and much more are automatically aggregated. The details of a Quick Trace can be reviewed in the Quick Trace Viewer or on the Quick Trace tab, and easily grouped and sorted, providing a clear picture of the processes, hosts, applications, or users responsible for all activity during the sample.

The Quick Trace approach to process-level monitoring offers several distinct advantages over other alerting systems which require continual SQL Server process-level monitoring:

  • Quick Trace consumes much less system overhead, since it is only used for specific alerts, with specifically defined constraints, and for well-defined time periods.
  • Quick Trace consumes much less storage to store the collected metrics, since process-level information is only stored for the brief time period of the Quick Trace.
  • Quick Trace further reduces overall "noise" of the alerting system, since it collects only in problematic situations, as defined by the alert conditions.

Seeing Overall System Performance at the Time of an Alert

A common problem for alerting systems built for SQL Server is that they only alert on SQL Server itself or perhaps also on a few other elements outside of SQL Server. This is extremely limiting for the DBA who also has to manage many of the other SQL Server components besides the relational engine, for example SQL Server Analysis Services (SSAS), SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), SharePoint, and many other related processes and services. SQL Sentry enables you to see all non-SQL Server activities on your servers through two features: the Performance Advisor Overview and the Processes tab.

Performance Advisor Overview

SQL Sentry enables you to visually assess server behavior, both current behavior and long-term historic behavior, for SQL Server metrics. For example, the DBA can use the Performance Advisor Overview view, as shown in Figure 14 below, to show all of the SQL Server instances on the server LONDONSRV07 and the overall Disk I/O read history for the last two days, as shown in the bottom pane of the figure:

A common problem for alerting systems is that they do not provide holistic performance information across the entire Microsoft stack where the SQL Server instance is installed.
Performance Advisor Dahsboard Overview

Figure 14: Performance Advisor Overview Dashboard

In the example shown in Figure 14, we are examining three different instances of SQL Server running on the server LONDONSRV07 in the top panel, and seeing a two-day view of the performance metrics related to disk I/O reads (ms/Read) in the bottom panel, sorted by database. The bottom historic view is especially useful for observing the long-term behavior of the metric, in this example, the specific database (tempdb) with heavy read activity.

The Processes Tab

The Processes tab, a part of Performance Advisor for Windows, displays a grid view of all the relevant and monitored Windows processes. SQL Sentry eases this process by providing a selection of “well-known” categories of processes commonly deployed along with SQL Server.

A common problem for many DBAs is that they are unable to build alerts for applications commonly associated with SQL Server, such as SharePoint or SSAS.

By default, Performance Advisor for Windows collects information about well-known or "categorized" processes, such as those involving SharePoint, IIS, SSAS, SSIS, and SSRS. Metrics are displayed for each process, providing insight into the resource usage for both the individual processes and their associated groups, as shown in the Figure 15 below:

Performance Advisor Windows Processes Tab

Figure 15: Performance Advisor for Windows Processes Tab

You can also collect information about processes which are not in the default groups, through two different options:

  • Add your own well-known groups and specify which processes belong to them.
  • Configure an Uncategorized Process Filter, for the collection of uncategorized processes, such as a custom-written Windows Service.

You may also choose to do both in tandem. For instance, your company may use both purchased applications and internally developed applications. You might wish to define several new well-known process groups (such as SQL Server) pertinent to the monitored computer's workload, so that information about those processes will always be collected.

Non-conditional Actions for Alerts

Many alerting systems, including the native SQL Server Agent alerting system, trigger a single, specific response when a single, specific condition is met. This behavior is direct, obvious, and without any nuance. However, this approach is deeply limiting to users in two ways, due to the singleton condition and the singleton response. First, alerts are triggered to respond when a single condition is met. Second, when triggered, alerts have only a single response. This approach to alerting means that someone is going to get a lot of noisy and unhelpful alerts.

Other alerting systems deeply limit users in two ways, due to requiring only a singleton condition and allowing only a singleton response.

Consider the example below shown in Figure 16:

SQL Server Agent New Alert and Response Tabs

Figure 16: SQL Server Agent New Alert and Response Tabs

In Figure 16 on the left side, we have created a new SQL Server Agent alert that tracks the number of times SQL Server processes a deprecated feature, particularly Transact-SQL commands. On the right side of Figure 16, we have set the response such that if the usage of deprecated features rises above 5 per second, then the level one Help Desk staff will be both paged and emailed, while the level two DBA will receive an email – every time the alert is triggered.

Both the DBA and the Help Desk will probably be frustrated by this experience, especially if there are any nuances to this rule. For example, it makes a lot of sense to be on the lookout for use of deprecated commands and features. But what if a new application, originally written on an earlier version of SQL Server, is purchased and installed on a server? The staff might want to exclude this database from the alert, since it is known to run in an older compatibility mode.

Alternately, the staff might want to simply route messages for that specific database to a member of the support staff of the specific business unit that purchased the application. Unfortunately, they cannot do either due to the limitations of SQL Server Agent alerts, resulting in both groups getting plenty of noise alerts. So the support staff ends up being a frustrated middleman for alerts that should go to business or development managers instead of the DBA. (In all fairness, the DBA could write custom Transact-SQL code wrapped in a SQL Server Agent job that might handle this conditional processing. But this would be a big job in and of itself.)

In contrast, SQL Sentry can apply multiple conditions, with as many OR and AND evaluations as you need, to a given alert, as shown below in Figure 17 (and earlier in Figure 8):

SQL Sentry Custom Condition Editor

Figure 17: SQL Sentry Custom Condition Editor

As shown in the above example, you can create alert conditions that contain multiple parts, including source conditions from anywhere in the Microsoft stack, such as a combination of SQL Server alerts, SSAS alerts, and Windows alerts. In this example, we look at two alternative sets of conditions to determine whether to raise a single alert for high I/O, either based upon wait stats (the upper condition) or upon the PerfMon counters for both the average disk time for read and write latency. When either the wait stats metric or the two physical disk metrics pass their threshold, an alarm is triggered. So building an alert condition for the deprecated events scenario described earlier, in Figure 17, is an even simpler challenge.

SQL Sentry custom conditions directly support multipart conditions, as shown above. But we can easily go one step further by facilitating various alert routing or multiple alert actions depending upon the severity of the condition. So, let's reimagine the requirements of our deprecated usage alert scenario. In this more stringent scenario, imagine that the level one Help Desk needs to get the alert email if usage is between 5 and 10 per second, the level two on-call DBA needs to get the alert email if usage is between 11 and 20 per second, and the DBA manager needs to get the alert email if the usage is more than 20 per second.

A common problem with alerts is that they typically do not provide any contextual information about long-term trends describing why they were triggered. This shortcoming means that DBAs are basically forced to assume that every alert is a one-off occurrence. Further, they are unable to correlate the occurrence of alerts with specific operations, such as month-end report processing, specific activities such as DBCC consistency checks, or spikes in utilization and resource shortages.

By comparison, most alerting systems provide very little trending information.

When looking at SQL Server Agent job alerts, as shown below in Figure 18, we can see only high-level information regarding the last run of a particular job:

SQL Server Agent Job Activity Monitor History

Figure 18: SQL Server Agent Job Activity Monitor History

For example, we can see whether a job failed or succeeded, when it last ran, and when it will next run. These are all singleton points of information, useful only at the time of the particular job operation. They provide no historic context, nor any information which might enable you to make a conclusion about future performance.

Similarly, when examining SQL Server Agent alert information for trends, we can only review high-level details, such as the last time the alert was triggered and the number of times it occurred, as shown below in Figure 19:

SQL Server Agent Alert History

Figure 19: SQL Server Agent Alert History

As shown in Figure 19, the only information provided is the last time the alert was fired, the last time a response was triggered, and the number of times the alert has occurred since the counter was last reset. Not very informative for trending!

Compare the SQL Server Agent Job history default behavior with SQL Sentry Event Manager's Runtime Graphs. Whereas SQL Server Agent jobs can tell you how long a variety of jobs ran, the Runtime Graphs node displays an event object's historical runtime information, vastly improving the contextual information you have for deeper analysis, in two views: Runtime History and Runtime Aggregates. (Note that Runtime Graphs are not limited to alerts, but can also be used for long-running SQL statements and any other operation watched by Event Manager.) These graphs make it easy to identify runtime trends, which can be an indicator of or eventually predict future performance issues with the event or the server.

Runtime History Graph

The Runtime History graph shown below in Figure 20 is an event's actual runtime and status over the specified interval. The event, in this case, is a SQL Server Agent job called Update Measures:

Event Manager Runtime History

Figure 20: Event Manager Runtime History

Each bar in the chart shows the date and time that the event occurred, on the X-axis, and the number of seconds it took to complete, on the Y-axis. The color of the bar also shows whether the Update Measures job completed successfully (the blue bars) or failed to complete (the red bars). The dotted lines also show the event's maximum, minimum, and average run time. This information can be very valuable in determining whether the event can consistently complete processing within a consistent period of time or whether it varies significantly each time it runs.

By comparison, you might wish to know the broader statistics about the event, for example, if you wanted to see the median runtime rather than the average. Event Manager also supports this analysis using Runtime Aggregates.

Runtime Aggregates Graph

The Runtime Aggregates graph, shown below in Figure 21, displays the distribution of the event's actual runtime over the specified time interval. This data can be valuable for a number of purposes, such as helping to decide where to place a job in the schedule to minimize contention, or to determine if a given SQL Server is already over-scheduled:

Event Manager Runtime Aggregates

Figure 21: Event Manager Runtime Aggregates

When examining the runtime aggregate data of the Update Measures job in Figure 21 above, the fact that the median (i.e. most commonly occurring) runtime is between 4.4 – 4.5 minutes immediately pops out at you, as the value in the second and tallest column in the graph.

[Note that the Runtime Aggregates graph rotates axes as compared to those in the Runtime History graph. So the duration of the event is shown on the X-axis while the number of times the event has occurred is shown on the Y-axis.]

Alerting Best Practices

No matter what alerting system you use, keep in mind this list of best practices. Note that these recommendations are process- and personnel-oriented. They are generally technology-independent.

  • If issues are encountered that were not detected by your monitoring solution, consider them unresolved until a server alert is created to detect if they ever reoccur.
  • Route server alerts to the appropriate audience. For example, keep the DBA out of the alert chain for networking issues.
  • Escalate server alerts appropriately, so that only truly critical alerts disturb staff during their non-work hours.
  • Review and tune server alert thresholds and conditions regularly. Excise any noise alerts and false positives.
  • Review server alert metadata to detect trends at least weekly. For example, frequently occurring DNS errors might point to a problem with the router hardware or NIC configuration settings.
  • Write server alert response policies so that staff is clear on the amount of detail to record when responding to an alert. A minor server alert might do fine with a note telling who resolved the problem and when, such as "Acknowledged by KEK, 25-Mar-2014 2:56 PM CST." But a major or critical server alert should probably include a much more comprehensive recap of the remediation needed to fix the problem. That way, future staff members have a handy reference for fixing the incident that triggered the alert.

Conclusion

This article discussed the most common problems facing IT teams who implement an alerting system. By putting an alerting system in place, IT organizations are able to provide better quality of customer support by forewarning staff of problems, often before end-users are aware of a problem in their application, while promising to makes it possible for the right members of the IT staff to keep up with the right information that matters most to them.

Unfortunately, most alerting systems have a lot of problems and issues. The most common issues, which we discussed in detail, include:

  • Inability to choose the appropriate response for a given alert, whether for a SQL Server alert, a SharePoint alert, or a Windows alert.
  • Limited conditions and thresholds for alerts.
  • Noisy alerts, because they do not support customized conditions or because they do not have a context for “normal” discerned from an analysis of baseline conditions.
  • Limited or no ability to analyze the cause and effect of an alert, at the SQL Server Agent job level and at the overall SQL Server process level.
  • Limited or no ability to see overall system performance at the time of an alert, throughout other components of the Microsoft stack where SQL Server is running or across many servers in the production environment.
  • No ability to respond conditionally to alerts.
  • Inability to analyze alert trends, either individually for each alert event or in aggregate across all occurrences of the event over time.

While other alerting systems struggle with many or all of these problems, SQL Sentry Performance Advisor and Event Manager, available in a bundle known as the Power Suite, are designed from the ground up to handle them all quickly, directly, and easily. The Power Suite also makes it easy to scale out alerting throughout your IT enterprise by enabling you to set alert scope to the global IT environment (or smaller scopes) and by enabling easy export and import of your alerts.

For more information about general alerting and monitoring strategies, refer to the MSDN Monitoring Best Practices article.

For more details on SQL Sentry Performance Advisor Intelligent Alerting, refer to the blog of SQL Sentry CEO Greg Gonzalez.

SentryOne 15-Day Free Trial

  • SQL Sentry icon
    SQL Sentry SQL Server
    (10)
  • Win Sentry icon
    Win Sentry Windows/Hyper-V
    (10)
  • V Sentry icon
    V Sentry VMware
    (2)
  • BI Sentry icon
    BI Sentry Analysis Services
    (4)
  • DB Sentry icon
    DB Sentry Azure SQL DB
    (5)
  • DW Sentry icon
    DW Sentry Azure SQL DW
    (1)
  • APS Sentry icon
    APS Sentry Microsoft APS
    (1)
Let's Get Started:
All fields are required
(waiting for server validation)
Download SentryOne

By clicking download, you agree to the SentryOne Privacy Policy.

The best SQL query analysis and optimization tool is now completely free!

Get the standalone version of Plan Explorer and get to the root of tough SQL query issues.

LOOKS LIKE YOUR BROWSER IS OUT OF DATE

For your convenience here are links to supported browsers:

Your browser could be incompatible with this website.
If you close this window and continue you may experience less than optimal behavior.

Don't show this again