PERFORMANCE BASELINES AND BENCHMARKS FOR MICROSOFT SQL SERVER

Introduction: A Little Known Skill of Elite DBA's

Google AuthorTwitter: @keklineBlog: kekline.com

The ability to conduct a database performance test to establish baseline measurements and workload performance benchmarks is one of the skills that strongly distinguishes the senior DBAs (database administrator) from the junior DBAs. Once a DBA is proficient in baselining and benchmarking database performance, he or she hates to operate without them because they are so valuable.

The Challenges of Database Performance Baselines and Benchmarks

First, a quick explanation of the differences between a baseline and a benchmark, plus some more bits of nomenclature.

  1. A Baseline is a known value against which later measurements and performance can be compared. For example, 60-70 beats per minute is considered a healthy baseline value for a resting heart rate for an active, middle-aged person.
  2. A Benchmark is a test at higher levels of pre-defined load for comparison against a baseline. To continue the example, a runner in a marathon might see a heart rate of 110 to 130. The marathon race is the athlete’s performance benchmark, and from that benchmark they can extrapolate many other conclusions about their overall physical fitness and maximum potential.
  3. A High Watermark is a benchmark that establishes the maximum load at which the server can operate without performance degradation. In the heart beats per minute example, the American Heart Association has already established a high watermark for us at 220 minus your age.
  4. A SUT (system-under-test) is the database server under analysis (or the athlete, to stick to the analogy).

Just like in the health example, the baseline metric(s) literally define what is normal for the database application’s performance and, intuitively, we know that we can only know what is abnormal about a SQL Server by first establishing what is normal. For a DBA, few metrics are more valuable or have a more far-reaching impact than a baseline.

The purpose of baselining and benchmarking is that you know, definitively, what is normal and abnormal for your SQL Server. Without that knowledge, you are forced to rely on broad rules of thumb and hunches.

There are a number of reasons that few DBAs master performance baselining and benchmarking. We could just say, “It’s hard!” and leave it at that. We can also very safely say that DBAs are so busy coping with the most urgent issues of the day that there is no time left over to conduct a time- and resource-intensive set of tests. Both of those statements are very true, but they don’t go far enough. Instead, let’s take a moment to more fully characterize the most challenging issues with creating baseline metrics and benchmarks.

First, the ability to build baselines and benchmarks requires a lot of knowledge about:

  1. Hardware, including the interaction of I/O subsystems and memory.
  2. Windows Server OS and, very likely, a hypervisor, host, and guest OS of any virtual machines.
  3. SQL Server internal operations, like transaction log flushes, locking, and latching.
  4. Configuration options variables, both in SQL Server and in Windows Server.

After all, if you’re trying to construct a performance baseline for a given database application, but you change even one aspect of the underlying hardware, you could reasonably expect an entirely different set of values from the baseline.

Second, the DBA must also possess a thorough understanding of:

  1. The business needs of the database application.
  2. The database design.
  3. The type of work performed by the application, if not the actual transactional workload.

And if that were not enough, the DBA also needs to know a good deal about monitoring the system under test (SUT), as well as how to assess the metrics captured by those monitoring tools.

The True Value of Baselines: The Foundation for Service Monitoring

The challenges in collecting a baseline can be daunting! But the benefits of having a baseline are well worth the time and energy. The baseline isn’t a valuable thing in and of itself. It only gains value when you use it as the basis for service monitoring; that is, observing and assessing the health of the database service in real-time.

When I first started as a DBA in the early ‘90s, the extent of my service monitoring was my telephone. If it rang a lot with unhappy users at the other end, then I knew service was bad. If my phone didn’t ring, I assumed that everything was ok. It did not take long to realize how foolish that approach was. For one thing, users are never happy when they call about a problem. They are already mad that things are not meeting their expectations. That anger rubs off on you, the DBA, by association. You lose credibility.

Instead, why not find out what is normal for my server, and then observe server behavior proactively and quantitatively? As soon as I implemented service monitoring, I found a variety of benefits:

  1. We can avoid depending on “user experience” as the key indicator for performance. This is especially important when we have specific users who might negatively impact other users; e.g., the manager who pulls resource-intensive reports at a time when everyone else expects good performance for daily data entry.
  2. We can avoid arguments with other IT teams, or win those arguments, by proving exactly when an issue is related to our SQL Server or is from outside of the SQL Server. (We call this “the blame game.”)
  3. We can find problems even when users aren’t on the system. In the best case, we can fix the issue before users ever notice a problem. In the worst case, we can explain that the issue is known and a resolution is close at hand. In either case, we retain or even build our credibility within the organization.
  4. We provide “consistency of service.” Most users get angry when we fail to meet their expectations. Part of the problem for many DBAs is that they don’t know the proper expectations themselves. With a solid baseline, supplemented by multiple benchmark measurements, the DBA can ensure that both they and their users have consistent service and well-managed expectations.
  5. We become familiar with various types of problems that arise even when the server is behaving normally. For example, we might learn about some application design problems that require a response on our part, say frequent and long blocking chains. In some cases, you also discover problems that, ironically, have no response or can be safely ignored, say intermittent and brief network hiccups.

In this white paper, you’ll get a quick crash course in building a database performance baseline and how to collect one or more benchmarks, how to incorporate your findings into service monitoring, and how the SQL Sentry tools take all of the labor out of the ordeal.

 

Table of Contents

Constructing Your First Baseline

In this section, we will take you through the steps to construct your own baseline, as well as the most important benchmarks you will need to proactively manage your database systems.

The first bit of advice is to always work from a project plan, complete with a timeline, deadlines, and assigned roles. The test plan should essentially be a paper representation of everything that the performance baseline or benchmark will encompass. It should walk through the test in step-by-step detail describing:

  1. When and how the test environment is restored to initial state.
  2. When and how the workload is invoked and scaled.
  3. When and how the performance metrics are collected and stored.
  4. How long the test should run and when the test is considered complete.
  5. How to respond to failures or new test runs.
  6. Who is the test lead and who makes the call to abort a long-running test; especially important when a test fails.

I can tell you from hard-won experience that many weeks of hard work were ruined because the person or team performing the benchmark failed to prepare properly. Such mistakes were caused by forgetting a cardinal rule of benchmarks: a benchmark must produce results that are both reliable and repeatable so that we can foster conclusions that are rational and actionable.

Keeping the “reliable and repeatable” mantra in mind necessitates a few extra steps, especially in benchmarks.

The goal of the baseline is to understand and measure server performance under normal conditions, while the goal of a benchmark is to understand and measure server performance under a specific, pre-defined workload. Does the application eat up all the CPU cycles made available to it? Does it need so much memory that paging occurs? Does the application code thrash the tempdb database? Are all the user requests serviced quickly? After you understand how the application acts under normal load, you will be able to answer these questions.

If you have never done any of this before, you will want at least three distinct measurements for a SUT:

  1. A “normal” workload (the baseline and reference point for all further benchmark tests).
  2. A high watermark.
  3. The peak maintenance workload.

This last measurement is often overlooked by administrative staff, who simply trust that all backups, index rebuilds, and DBCC checks can finish in time and with minimal interruptions. But as we said before, if you don’t know what is normal for these workloads, how can you tell when it’s abnormal?

Back to TOC

Five Steps to Running a Successful Baseline / Benchmark Test

In the following section, we will cover the steps to run a successful baseline or benchmark test. There are a couple of hallmarks of a successful database performance baseline or benchmark test:

  1. The test is managed like a project.
  2. All aspects of the SUT are fully documented.
  3. The purpose and business requirements are understood and well-documented.
  4. The collected metrics are analyzed and re-incorporated into monitoring and alerting for the system.

The following steps, as shown in Figure 1, help ensure your success.

5 Steps to Successful Baseline/Benchmark Test

Figure 1: Five Steps to Run a Successful Baseline/Benchmark Test

Step 1: Define Requirements & Specifications

When gathering a baseline or running a benchmark test, it is important to clearly define the requirements in terms of functionality, performance, and scalability. Of course, your baseline is the set of metrics for the server under “normal” workload. However, by restating the requirements explicitly, you ensure that any assumptions built into the application or administrative approach are corrected, or at least documented.

Business requirements for the baseline/benchmark provide the DBA context for the expectations of their customers, the end users.

As a first step, define the business requirements and in non-technical terms. Some examples might be:

  1. “System cannot be down for more than 45 minutes”.
  2. “Report must return results within 3 seconds under a concurrent user load of up to 120 users”.
  3. “The system must sustain 500 batch requests per second without blocking”.
  4. “Data must be kept live for 3 months”.
  5. “Backups must be stored on the server for 7 days”.

This process of requirements definition frequently helps the DBA to understand the true needs of their customers, i.e. the end users, and provides important context for the technical requirements expected of their systems.

When conducting “what if” benchmarks, only ever alter one component or setting at a time.

However, many benchmarks are intended to prove the feasibility or benefit of a proposed architecture. For example, the DBA may have proposed enabling SQL Server’s database compression features on an important production application. How will that affect performance? Only a careful benchmark test can reveal what is in store. However, a key to successful “what if” benchmarking is to alter only one important component or setting at a time. If more than one thing is changed, how can you tell which one contributed to new measurements? Did one change help by +4x, but the other change hurt by -2x, leading to a seemingly acceptable +2x in improvement?

Back to TOC

Step 2: Construct a Workload

When building your first baseline, you may not need any more workload than is already available on the production application. However, collecting a production workload and then replaying that workload on a test server is easier said than done, especially when factoring in normal user latencies like keying time and think time. Figure 2 offers a couple of shortcuts which are often good enough, especially for “what-if” benchmarks.

Constructing a Workload

Figure 2: Constructing a Workload

Of course, the best database benchmarks are those that accurately and reliably reflect the applications and configuration of your own database infrastructure. On the other hand, the amount of work that goes into extracting your own transactional workload can be immense. The easier route is to learn and run industry standard benchmarks, such as those of the Transaction Processing Council, which reasonably simulate the workload of your SUT and then extrapolate the test results for your environments.

A Good-Enough Workload with HammerDB

My current favorite tool for performing TPC-like benchmarks is HammerDB, shown in Figure 3. HammerDB is a free, open-source load testing tool, which can run a workload based upon the TPC-C and TPC-H benchmarks on several different database platforms. It allows you to simulate heavy transaction processing and business intelligence workloads.

HammerDB is especially useful to understand the performance implication of enabling or disabling specific OS or database features, alternative hardware choices or VM architectures. But because it is a canned workload, you must take pains to study that workload so that you can effectively extrapolate the possibly great difference between its workload and your own workload. Your mileage may vary, as they say.

HammerDB Testing a TPC-C Workload

Figure 3: HammerDB testing a TPC-C workload on Microsoft SQL Server

HammerDB is distributed via Sourceforge and is usually updated several times per year by the all-volunteer development team. It is easy to learn and can drive very heavy workloads against your database servers with minimal configuration or setup. It is possible to modify the workload HammerDB drives; so, if you want to drive a heavy workload that is entirely of your own creation, it is possible to do, assuming you can write SQL code. If you encounter difficulties, HammerDB features great documentation and an active discussion forum where the developers respond quickly to questions and are very helpful.

Better Workloads with Custom Apps

If you wish to test in-house created application workloads, often your best compromise between speed of execution and realism is to enlist one or more developers to create a variation of the client application which generates a randomized but realistic workload. These do not need to be pretty or fancy. They just need to be a scalable and bug-free way to generate real SQL Server workload against the database under test using the same sort of queries and transactions that real users would generate.

Best Workloads: Real User Activity Replayed with SQL Server 2012 Distributed Replay

Starting in SQL Server 2012, Microsoft added the Distributed Replay Utility to run a multi-threaded workload against SQL Server, initially for benchmarking the impact of an upgrade from one version of SQL Server to another. However, it also offers a fully scalable solution to simulate mission-critical SQL Server workloads, conduct performance testing, and undertake “what-if” tests and capacity planning.

Distributed Replay Architecture

Figure 4: Distributed Replay Architecture from MSDN

Distributed Replay is composed of four main components, as shown above in Figure 4 and, while free, it can be complex to work with. The administration tool and workload controller make up the first two components, while the client(s) that generate the workload and push it to the target server make up the last two components. Since Distributed Replay uses trace files for its workload, you’ll also need to know how to create or capture SQL Server traces.

More information about the Distributed Replay Utility is available:

  1. Microsoft’s introductory information on the tools and features - http://msdn.microsoft.com/en-us/library/ff878183.aspx
  2. Jonathan Kehayias at SQLskills.com, Installation and configuration best practices - https://www.sqlskills.com/blogs/jonathan/installing-and-configuring-sql-server-2012-distributed-replay/
  3. Jonathan Kehayias at SQLskills.com, Instructions on Running the utility - https://www.sqlskills.com/blogs/jonathan/performing-a-distributed-replay-with-multiple-clients-using-sql-server-2012-distributed-replay/

Back to TOC

Step 3: Isolate and Document

It seems like an obvious requirement - that the SUT should be isolated from other production, development or test environments. Sure, it sounds easy enough. But in today’s world of consolidated disk storage and consolidated servers, not to mention virtual machines, this can be both expensive and difficult.

If possible, you want your benchmarking environment to share literally no components with other active systems. As you can guess, this need for isolation is one of the most expensive elements of a good baseline or benchmark. (The other expense is the man-hours invested in the project.) Imagine, for example, you were benchmarking on a computer that was also a user’s workstation. How could you be certain that a specific performance anomaly wasn’t actually the user’s Outlook application synchronizing with the Exchange server? That sort of shared environment can completely invalidate a benchmark.

Other elements of isolation that are usually necessary are 1) separate servers to generate a workload against the benchmarked environment, and 2) separate servers to collect and store performance metrics.

Isolate and Document System Under Test

Figure 5: Isolate and Document the System under Test (SUT)

I’ve seen many teams attempt to analyze their benchmark results, only to discover that they have no idea what the Windows registry and other settings are for the server they were testing on. Did the server enable Lock Pages in Memory? Was the NIC set with Jumbo Frames? Which power plan was the server using? Clearly, it is critical to fully document the SUT. Be sure to capture all of the following bits of information in a document, also shown above in Figure 4:

  1. Windows configuration details, including the Windows Registry.
  2. Hardware configuration details, including memory, storage subsystem, CPU (including power settings and hyperthreading), NIC settings, and HBA settings.
  3. SQL Server configuration settings at the instance, database, and connection levels. Also be sure to catalog the sp_configure settings, as well as the sp_dboptions settings for the user database and system databases.
  4. The database schema, including index statistics and metadata.
  5. The SQL transactional workload, including validation of the SQL transactional workload for the test at hand. I’ve encountered many teams who, upon analysis of their benchmarks, realized that they were running boatloads of SELECT statements for a benchmark of write speed. Whoops!
  6. Full descriptions of all of the metrics and instrumentation used to monitor the benchmark. Some tests can show very different results simply because the second test run used different metrics with a lighter overhead.

Quick Tip! You may find the Microsoft Assessment and Planning (MAP) Toolkit very helpful in rapidly collecting and synthesizing a strong core of documentation about your SQL Server and Windows server infrastructure.

Back to TOC

Step 4A – Run the Test

If you constructed the workload properly, running the test itself is pretty easy, as shown below in Figure 5.

Tips For Successful Baseline/Benchmark Test Run

Figure 6: Tips for a Successful Baseline / Benchmark Test Run

There are a couple of tricky parts for a novice. First, a rather long and unpleasant learning experience for me, back in the day, was an effective way to quickly refresh the entire test environment. In those days, we had to restore a rather large database, reboot all the servers, and reinitialize our monitoring instruments. While you can still use this approach, there are newer and faster alternatives, such as database snapshots and, my current favorite, restoring a VM and/or VM snapshots.

Don’t be afraid to abort the test if things go awry early on. And, as mentioned earlier, only change one aspect of the SUT per test iteration.

Back to TOC

Step 4B – Monitoring While the Baseline / Benchmark Test is Running

You must determine which server you will use to monitor your SQL Server, as well as the location of any monitoring log files. The best practice is to monitor the SUT remotely, so that it is not encumbered by the overhead of collecting metrics and/or writing to the same I/O subsystem used by SQL Server.

Monitoring with Microsoft native tools has many issues. Read our white paper entitled SQL Server, SharePoint & Windows Alerts to learn more.

Because of these performance concerns, the use of performance counters may need a bit of tuning. You will need to carefully evaluate the number of metrics and polling frequency that offers the best mix of insight versus overhead. For the initial baseline, however, it is recommended that as many counters as desired be used with the highest frequency available. Subsequent tests can collect fewer metrics and/or at a lower polling frequency. Once you have some experience with evaluating the resulting data, you will probably begin to limit the collections. In the absence of any other recommendations, collect the following throughout the baseline or benchmark test:

  1. The most salient PerfMon counters detailing overall server resource utilization with a tight polling frequency between every 3 to 30 seconds, depending on your needs. Refer to Appendix A for a list of suggested PerfMon objects and counters when not using SQL Sentry Performance Advisor.
  2. The top twenty non-system wait statistics showing SQL Server behavior, using the DMV sys.dm_os_wait_stats, every 1 to 5 minutes. Don’t forget to leave out benign queue and timer waits, as illustrated by Paul Randal in “Wait Statistics, or Please Tell Me Where it Hurts.” for a quick list of some important wait statistics, refer to appendix b.
  3. In some cases, depending upon the goal of the benchmark, additional DMVs are helpful, especially when correlating their results against wait statistics and PerfMon counters. For example, you may also choose to query every few minutes from:
    1. sys.xxx_memory_cache_xxx, sys.xxx_clerks, sys.dm_os_process_memory for all things memory-related.
    2. sys.dm_exec_xxx_stats for individual query, procedure and trigger usage statistics.
    3. sys.dm_os_threads to figure out thread memory consumption.
    4. sys.dm_exec_requests to gauge blocking and wait times and reasons for specific queries.

Quick Tip! Don’t forget about tempdb! Tempdb is a major source of resource consumption and contention on many multi-database systems. Collect information about tempdb performance as you would any other production database when you test.

Monitoring Best Practices with Native Tools

  1. If performing a high watermark benchmark, you may wish to run any special SQL Server monitoring queries using a Dedicated Admin Connection. Many restrictions apply.
  2. When testing a SQL Server upgrade, monitor the PerfMon counter SQL Server: >Deprecated Features – Usage
  3. When monitoring overhead is high:
    1. Minimize the number of counters you are monitoring. Less monitoring = less overhead.
    2. Record monitoring logs on a different machine than the SUT.
    3. Adjust to polling frequency when overhead is high. You get less frequent data points, but a commensurate reduction in overhead as well.
    4. Don’t monitor more metrics than necessary.
    5. When experiencing extreme overhead, send logs directly to disk without watching the metrics, since displaying the metrics can add latency.
    6. Don’t use Profiler at all. And only use server-side traces when necessary, with filters in places such as limiting the trace to the database under test and tempdb.

Back to TOC

Step 5 – Assess and Interpret

Long experience at assessing and interpreting the results of baselines has taught me not to rely on large tables of data with lots of data points. The mind glazes over quickly. Don’t throw away the source data held in the monitoring log files. Instead, use Excel to load the monitoring log data into a single graphic representation that makes sense to you. It provides enough information to interpret the results, and makes it easy for you to understand the overall trend of the performance metrics.

Write a summary document to record all of your findings!

The Summary Document is the second most important product of a baseline or benchmark test. For baselines, the document should summarize things like common statistical values for important metrics, including min, max, average, median, and mode values. For benchmarks, the document should also summarize any important extrapolations and trends learned in the test. For example, in a benchmark to see how the current server performs with database compression enabled, the document might summarize that for every X amount of buffer used with compression enabled, CPU utilization increases by Y percent.

The Summary Document should also take into account the impact of important system processes. For example, the SUT might be able to sustain a higher number of concurrent users except during the preventative maintenance cycle due to resources required for processes like backups, DBCC processes, or index maintenance. Similarly, the document should take into account important business cycles, such as the end of month accounting close or an important peak usage time.

Back to TOC

Step 6 – After Testing

Many IT teams make the mistake of thinking about their baselines and benchmarks as self-contained entities. However, you will find the real value of both baselines and benchmarks when their findings are used to fortify monitoring and alerting of your SQL Servers. Your baseline provides context for alerting and reacting to problems.

If you haven’t already done so, it is urgent that you implement an ongoing, 24x7 monitoring and alerting system based upon your findings. The native tools provide a limited monitoring and alerting capability as described in our white paper SQL Server, SharePoint & Windows Alerts.

Back to TOC

Baselining in SQL Sentry Performance Advisor

After spending all of this time describing how to collect a baseline and how to run a performance benchmark, here is some wonderful news: If you use SQL Sentry Performance Advisor, you don’t have to do either. Performance Advisor comes with built-in temporal baselines visible in the upper right corner of the Dashboard, starting in v8, as shown below in Figure 6.

SQL Sentry Custom Conditions

Figure 6: Performance Advisor Dashboard Dropdown

Typically, you are able to choose between one of three built-in baselines. It is a smart interface, however, and the built-in baselines available from the list will change depending on the time range of the active date range. (That is, the Range Start and Range End on the upper left of Figure 6.) For example, if you are viewing a 3-day range of performance data in the dashboard, the Previous Day doesn't make sense to display, so we don’t provide the option. Also, the Range option is currently only available when viewing detail data (range <= 30 minutes). Otherwise, the Average option will be displayed as an overlay in the dashboard.

In Figure 7, shown below, you can see a Previous Day baseline with a Range style overlay on the Performance Advisor Dashboard. You can see that patterns emerge on several charts with spikes in CPU, wait statistics, and database I/O, just after 8:00 AM. The dashed lines and shaded area within, best illustrated in the Log Flushes graph on the middle bottom right of the dashboard, present a moving average showing min/max of normal values for that time of day. As it turns out, the spikes in waits, CPU, etc., are well within normal behavior for that time of day. Of course, normal doesn't necessarily mean acceptable. This functionality is impossible with Microsoft native tools.

SQL Sentry Custom Conditions

Figure 7: Performance Advisor Dashboard with Previous Day Temporal Baseline Overlay

When looking at longer time ranges, or when you choose it by selecting the Average radio button in the top-right corner, the Performance Advisor Dashboard will display the average value in each graph as a dashed line. As is normal for Performance Advisor, you may drill deeper into the interface or Jump To other sections of the interface, such as the Top SQL or Disk Activity tabs.

Custom Baselines

In the database industry, a benchmark is a test in which you collect the performance metrics of a system under a specific, pre-defined workload. Performance Advisor does not build in a workload generation feature, but what could be a better indicator of performance under load than to actually measure specific, pre-defined workloads? In other words, benchmarks in the general industry equal custom baselines in Performance Advisor.

So in addition to the built-in temporal baselines, you can create your own custom baselines that are specifically calibrated against specific time periods and workloads! To do so, simply click and drag to select a date range on any of the charts, right-click to invoke the context menu, and then select the Create Baseline menu option, as shown below in Figure 8.

SQL Sentry Custom Conditions

Figure 8: Creating a Custom Baseline

When you click the Create Baseline menu option, the Edit Baseline screen appears, showing the Average, Min, Max and Standard Deviation for each dashboard metric over the specified time range, as shown below in Figure 9.

Here, we’ll call this custom baseline "Business Hours" to represents normal and acceptable performance during that time. You can also select the checkbox in the Default column to choose which metrics are shown by default on multi-metric charts. You can also manually adjust the baseline value by editing a specific value in the Average column. For example, you might decide that the average value for Page Life Expectancy (17,423.0) is not representative and simply change it to your preferred value (just not 300, please). Clicking OK closes the dialog box and saves any changes.

SQL Sentry Custom Conditions

Figure 9: Edit Baseline Dialog Box

Once saved, the baseline is added to the Baseline dropdown list, as you can see in Figure 10 below, where the “Business Hours” custom baseline appears. When you select your new custom baseline, a horizontal line appears on each chart showing the baseline value for each metric. The line provides a reference point so you can easily spot unusual activity.

In Figure 10, you can also see that at around 10:10AM something started impacting disk waits, CPU, PLE, and page reads, causing them to deviate from the normal business hours level. Time to investigate!

SQL Sentry Custom Conditions

Figure 10: Displaying a Custom Baseline on the Performance Advisor Dashboard

From here you can highlight the range and Jump To other areas like Top SQL, Processes or the Event Manager Calendar to diagnose the root cause.

There are many other use cases like this for custom baselines. Here are a few:

  1. Create a baseline prior to making a configuration change on the server, then use it to immediately monitor how the change impacts performance.
  2. Create a baseline for intensive maintenance operations like integrity checks, index rebuilds, or backups, and periodically review the operation against its baseline to ensure it is still performing as expected. I've seen many times where a change to the network or the SAN by some other group can have a dramatic impact on maintenance ops, and this is a great way to catch this!
  3. Create periodic date-based baselines, such as "Jan 2014", "Feb 2014", etc., and use them to assess how performance is changing over time.

You can create as many custom baselines as you want, and edit them at any time by clicking the build button to the right of the dropdown. For multi-metric charts, you can change the default metric shown without editing the baseline by right-clicking it in the legend.

Back to TOC

Integrating Baselines and Alerting

These visualization capabilities are great, but the feature wouldn't be complete without the ability to generate alerts using baseline metrics. That's where the other new SQL Sentry functionality comes in. When a custom baseline is created at the Windows Server or SQL Server level, it becomes available for use in custom conditions at that level.

Using baselines in a custom condition is easy. Instead of setting an explicit threshold value on the right side of a comparison, select Performance Counter Baseline, and its value for the metric is automatically substituted. This way, if a baseline value ever changes, you won't need to update any conditions.

In the custom condition show in Figure 11 below, we've added three separate baseline tests with some basic math to control the amount of a deviation from the baseline value required to trigger the alert.

Custom Condition Builder Using Baseline Values

Figure 11: Creating a Custom Condition Based on the Business Hours Custom Baseline

The system knows to use the Business Hours baseline value for the counter on the left side of the comparison. Basically, the interface uses context-aware shorthand that will trigger the alert whenever Total SQL Server Waits are three or more times greater the baseline, or when PLE is half the baseline, or when Total Server Processor Time total is twice the baseline.

(Note: It is possible to select a baseline on the left side. But you'll forego the shorthand when doing so. Since it is generally reads easier with the counter on the left and threshold on the right, we recommend this approach anyways).

No other product provides as rich an alerting interface, nor as graceful a method of deriving and incorporating baseline values into your alerting system.

Back to TOC

Baseline and Benchmark Testing Best Practices with the Native Tools

No matter the size of your SQL Server estate, remember that you’ll benefit strongly from baselines and the contextual information they provide about your servers.

  • Collect a baseline of metrics so that you can see how your server acts under normal conditions.
  • Perform at least two more performance benchmarks.
    • One to determine the high watermark for that server’s performance.
    • And another to see that server’s behavior during peak preventative maintenance activity.
  • Write a detailed test plan and conduct the test like a well-managed project.
  • Stabilize and fully document the SUT, from the server all the way through to the application.
  • Build a true database workload that is as realistic as possible.
  • Run and monitor the test from other hosts. Record and store the monitoring logs apart from the SUT.
  • Collect only the metrics necessary for the test at a polling frequency that doesn’t incur a lot of overhead.
  • Analyze and document your findings in a summary document. Use charts and other data visualizations to drive home any lessons learned.
  • Incorporate the findings of a baseline or benchmark test into your service-monitoring regimen. If you don’t have one, implement one using the native tools immediately.

Back to TOC

Conclusion

A baseline shows you what SQL Server looks like when it is acting normally. It provides you with an empirical context for building a comprehensive service-monitoring regimen which, in turn, enables you to provide exceptional levels of customer service, uptime, and proactive management of your SQL Server infrastructure. Without a baseline, you risk many potential problems with your SQL Servers:

  • No idea about whether a particular metric is meaningful or useless for a given SQL Server and a given application.
  • No understanding of whether your servers’ behavior is keeping with its normal trend.
  • Depending upon user experience to know when a SQL Server is having a problem, which can result in a loss of credibility and personal prestige.
  • Losing “the blame game” to other departments of the IT organization.
  • Inconsistent service to customers.
  • Completely miss a problem or issue caused by an unusual surplus of capacity. For example, it might indicate that a particular service, process, or group of users cannot get to the server to do their normal work.

So if it is normal for your server to be heavy utilized and then, suddenly, it is lightly utilized one day, you can tell that something is out of the ordinary. Perhaps the network is having trouble and the users, your customers, are not connecting to your server for some reason.

In contrast, a benchmark shows you SQL Server performance under a specific, pre-defined workload. This can help you determine the SQL Server’s maximum performance capabilities and specific limitations of its capabilities, such as the maximum number of users it can concurrently support.

SQL Sentry Performance Advisor turns the long and arduous process of creating baselines and benchmark measurements into a simple 3-click process.

While it is possible to incorporate any baseline or benchmark findings into the alerting of the SQL Server native tools, it requires a great deal of manual work, Transact-SQL coding, and possibly application coding as well. Plus, using the SQL Server native tools requires you to manually conduct a database performance baseline or benchmark test, assess the findings, and then manually build out an entire set of alerts based upon those findings. Using Performance Advisor, you can automatically collect real system performance measurements, turn them into baselines, and then reference them in custom conditions to trigger the appropriate alerts. That way, you’re sure that you’re only alerted to situations which are truly significant and important to the monitored host.

Not only does Performance Advisor provide built-in temporal baselines, it also provides the ability to create custom baselines, in essence, specific performance benchmarks. In addition, Performance Advisor baseline metrics may be used as the basis for all of your alerting. So while other alerting systems struggle with these problems, Performance Advisor is designed to handle them all from the ground up - quickly, directly, and easily. Performance Advisor also makes it easy to scale out alerting throughout your IT enterprise by enabling you to set baseline-driven alerts for as many or as few servers in your global IT environment as you like.

For more information about general alerting and monitoring strategies, refer to the MSDN Monitoring Best Practices article at https://msdn.microsoft.com/en-us/library/windows/desktop/bb226833(v=vs.85).aspx.

For more details on SQL Sentry Performance Advisor advanced alerting and baselining, refer to the blogs of SQL Sentry CEO Greg Gonzalez at http://blogs.sentryone.com/category/baselines/.

Finally, for a free, full-featured, time-limited trial of SQL Sentry Performance Advisor, email sales@sentryone.com.

Back to TOC

Appendix A: PerfMon Objects and Counters Useful for SQL Server Baselining and Benchmarking

When configuring PerfMon for a baseline or benchmark test, consider using the following objects and counters. By default, set the PerfMon session to record the log file to a location other than the system under test. Set the polling frequency to be as low as is acceptable keeping in mind the added overhead created by a frequent polling interval. A typical polling interval might be anywhere from every 3 seconds to 30 seconds, or even every minute, under some baselines or benchmark tests.

Object Counter Rule of Thumb Notes
 
Paging File % Usage < 70% The amount of page file currently in use. High page file usage indicates memory shortages that are so severe they must be spilled to the I/O subsystem, resulting in a serious performance bottleneck.
Processor % Processor Time <= 80% The amount of total CPU in use across all system activities. The higher this metric is, the more likely users will be delayed.
Processor % Privilege Time < 40% of % Processor Time The amount of time spent executing kernel commands (i.e. real work) like SQL Server IO requests. This value should seldom rise above 40% of total %Processor Time.
Process(sqlservr) Process(msmdsrv) % Processor Time < 80% This is the amount of CPU used solely by SQL Server (sqlservr) and/or Analysis Services (msmdsrv) process threads. It’s even more important to monitor both of these metrics if the server is running both services simultaneously. In dedicated servers, you may monitor the single service in use on that machine.
System Processor Queue Length < 2 This is the single best indicator of CPU pressure on the Windows Server. It shows how many threads are currently waiting on an answer by a CPU on the server. Less than 12 per CPU is good/fair, less than 8 is better, and less than 2 is best
System Context Switches/sec < 1500 This metric shows the number of execution contexts switched in the last second. High-end hardware can now handle very high amounts of context switching, but in the absence of any other indicators greater than 6000 is a busy system, less than 6000 is fair, less than 3000 is good, and less than 1500 is excellent.
Physical Disk Avg ms/read < 8 This is metric indicates the time, in milliseconds, spent waiting to complete a read operation on average. It is also known as read latency. The most common physical bottleneck is usually the I/O subsystem. Microsoft’s guidance for HDD-based systems is that read latency of greater than 20 is poor, less than 20 is good/fair, less than 12 is better, and less than 8 is best.
Physical Disk Avg ms/write <1 This is metric indicates the time, in milliseconds, spent waiting to complete a write operation on average. It is also known as write latency. As with the previous metric, Microsoft’s guidance for HDD-based systems is that write latencies of greater than 4 is poor, less than 4 is fair, less than 2 is better, and 1 or less is best – on systems with a write-cache. On I/O subsystems without a write-cache, a write latency greater than 20 is poor, less than is 20 fair, less than 12 is better, and 8 or less is best.
Memory Available Mbytes =>300 An indicator showing the amount of physical memory, in megabytes, available to run processes on the machine. You generally want a few hundred MB available to handle the random file copy or similar activity needed for system administration.
SQL Server: Memory Manager Memory Grants Pending ~0 This counter shows the current number of processes waiting for a workspace memory grant within SQL Server. You want this value to be close to zero most of the time. If it is frequently greater than zero, then SQL Server needs more memory or is running T-SQL code that is wasteful of memory.
SQL Server: Buffer Manager Free List Stalls/sec < 2 This counter measures the frequency that requests for data buffer pages are suspended (a.k.a “stalled”) because there are no data cache buffers. If this metric is greater than 2, then SQL Server needs more memory or is running T-SQL code that is wasteful of memory.
SQL Server: Databases Log Flush Waits/sec ~0 This counter shows, effectively, the number of times per second that SQL Server must wait for pages to be written to the transaction log. Only highly-stressed systems should have a sustained value of greater than zero.
SQL Server: General Statistics User Connections Proportionate to Batch Requests /sec This metric shows the number of users currently connected to the SQL Server. For most workloads, it should roughly track with Batch Requests/Sec, rising and falling together. Taken with Batch Requests/Sec, it is a good indicator of overall busyness of a SQL Server.
SQL Server: Buffer Manager Stolen Pages/sec Proportionate to Batch Requests /sec This metric shows the number of 8k pages that were “stolen” from the data cache to satisfy other memory needs, like plan cache and workspace memory. It’s a good indicator of the rate of data flow among SQL Server caches and should remain proportionate to Batch Requests/sec.
SQL Server: SQL Statistics Batch Requests/Sec Proportionate to workload This shows the overall busyness of a SQL Server, that is, the number of batch requests the database engine receives per second. High-end servers running modern versions of SQL Server can sustain 10’s or even 100’s of thousands of batch requests per second on a well-tuned workload. (Note - this counter does not track all Hekaton transactions).
SQL Server: SQL Statistics SQL Server: SQL Statistics < 10% of Batch Requests/Sec This indicator shows the number of times that T-SQL compilations occurred, per second, including recompiles. In general for OLTP workloads, this metric should remain proportionate to the number of Batch Requests/sec. High values may indicate excessive ad hoc querying due to poorly coded T-SQL.
SQL Server: SQL Statistics SQL Re-Compilations/sec < 10% of SQL Compil-ations/sec This indicator shows the number of times, per second, that T-SQL compiled code attempted to execute but had to be recompiled before completion. This number is preferably at or near zero, since recompiles can cause deadlocks and exclusive compile locks. For most workloads, it should remain in proportion to Batch Requests/sec and SQL Compilations/ sec.

This list is for quick reference only. For a comprehensive list of potential PerfMon object counters you might wish to use, refer to the list of counters used by Performance Advisor in our on-line User Guide.

Appendix B: Top Wait Statistics Indicative of Performance Issues

The following wait statistics may be useful when creating a baseline or running a benchmark test. You need only collect the appropriate data from the sys.dm_os_wait_stats DMV (dynamic management view) while running your tests. You should also be certain to collect the time that the data is collected, since wait statistics are cumulative and cannot give you a moment-by-moment record of their growth over time. As with collecting PerfMon objects and counters during a baseline or benchmark test, record the results to a location not on the system under test. In addition, recognize that the polling frequency adds overhead. So collect only what you need, usually at a somewhat lower frequency than you might with PerfMon object counters, say, every 1 to 15 minutes. A few details about the top wait statistics follow:

CPU Pressure

  • SOS_SCHEDULER_YIELD: Indicates waits caused by CPU pressure due to frequently yielding a scheduler among threads.
  • CXPACKET: Represents the amount of wait time consumed by parallelized operations. This is not usually an indicator of “bad” behavior, although SQL Server’s default choice is to aggressively parallelize operations.

Locking

  • LCK_X, LCK_M_U, & LCK_M_X: Shows the time spent waiting on long term blocking. Often an indicator of poorly coded T-SQL, transactions, or isolation levels.

Memory

  • PAGELATCH_X: Shows the time spent waiting for a data buffer latch. A high value may indicate a shortage of memory for SQL Server’s data buffer.
  • LATCH_X: Represents the time spent waiting for a non-data buffer latch, such as in the plan cache. A high value may indicate a shortage of SQL Server’s non-data buffer.
  • RESOURCE_SEMAPHORE: Shows the time spent waiting for a memory grant from Windows Server to SQL Server. A high value may indicate that other processes on the server also need memory and are “stealing” it back from SQL Server.

I/O

  • PAGEIOLATCH_X: This is the time spent waiting for data to be loaded from the I/O subsystem to the SQL Server data cache buffers. High values may indicate either a slow I/O subsystem and/or inadequate memory to hold the amount of data needed by the workload.
  • WRITELOG & LOGBUFFER: This is the amount of time spent managing the transaction log I/O subsystem. High values in either counter may indicate that the SQL Server database transaction log is on a slow or overworked I/O subsystem.
  • ASYNC_IO_COMPLETION & IO_COMPLETION: These wait statistics show the amount of time spent waiting for general I/O operations to complete. High values may indicate a slow or overworked I/O subsystem.

Network Pressure

  • ASYNC_NETWORK_IO: Indicates the overall wait time SQL Server spends waiting for asynchronous network I/O, the most commonly used sort of network I/O. High values may indicate an inadequate or overused NIC.

Back to TOC