Performance Baselines And Benchmarks For Microsoft SQL Server

Introduction: A Little-Known Skill of Elite DBAs

KevinKline_Headshot Google AuthorTwitter: @keklineBlog: kekline.com

 

The ability to conduct a database performance test to establish baseline measurements and workload performance benchmarks is one of the skills strongly distinguishing senior DBAs from junior DBAs. Once a DBA is proficient in baselining and benchmarking database performance, they hate to operate without them because they’re so valuable.

The Challenges of Database Performance Baselines and Benchmarks

First, a quick explanation of the differences between a baseline and a benchmark, plus some more bits of nomenclature.

  • A baseline is a known value against which later measurements and performance can be compared. For example, 60 – 70 beats per minute is considered a healthy baseline value for a resting heart rate for an active, middle-aged person.
  • A benchmark is a test at higher levels of predefined load for comparison against a baseline. To continue the example, a runner in a marathon might see a heart rate of 110 to 130. The marathon race is the athlete’s performance benchmark, and from this benchmark, they can extrapolate many other conclusions about their overall physical fitness and maximum potential.
  • A high watermark is a benchmark establishing the maximum load at which the server can operate without performance degradation. In the heartbeats per minute example, the American Heart Association has already established a high watermark for us at 220 minus your age.
  • The system under test (SUT) is the database server under analysis (or the athlete, to stick to the analogy).

Just like in the health example, the baseline metrics define what is normal for the database application’s performance, and, intuitively, we can only know what is abnormal about a Microsoft® SQL Server® instance by first establishing what’s normal. For a DBA, few metrics are more valuable or have a more far-reaching impact than a baseline.

The purpose of baselining and benchmarking is to let you know, definitively, what’s normal and abnormal for your SQL Server. Without this knowledge, you’re forced to rely on broad rules of thumb and hunches.

There are several reasons why few DBAs master performance baselining and benchmarking. We could just say, “It’s hard!” and leave it at that. We can also safely say DBAs are so busy coping with the most urgent issues of the day there’s no time left over to conduct a time- and resource-intensive set of tests. Both statements are true, but they don’t go far enough. Instead, let’s take a moment to more fully characterize the most challenging issues with creating baseline metrics and benchmarks.

First, the ability to build baselines and benchmarks requires a lot of knowledge about the following:

  1. Hardware, including the interaction of I/O subsystems and memory. (This applies even when using a public cloud, like Microsoft Azure® or AWS®).
  2. The Windows Server® operating system (OS) and likely a hypervisor, host, and guest OS of any virtual machines.
  3. SQL Server internal operations, like transaction log flushes, locking, and latching.
  4. Configuration option variables, both in SQL Server and in Windows Server.

 

After all, if you’re trying to construct a performance baseline for a given database application and you change even one aspect of the underlying hardware, you can reasonably expect an entirely different set of values from the baseline.

Second, the DBA must also possess a thorough understanding of the following:

  1. The business needs of the database application.
  2. The database design.
  3. The type of work performed by the application, if not the actual transactional workload.

And if this weren’t enough, the DBA also needs to know a good deal about monitoring the SUT and how to assess the metrics captured by those monitoring tools.

The True Value of Baselines: The Foundation for Service Monitoring

The challenges in collecting a baseline can be daunting, but the benefits of having a baseline are well worth the time and energy. The baseline isn’t a valuable thing in and of itself. It only gains value when you use it as the basis for service monitoring, which means observing and assessing the health of the database service in real time.

When I first started as a DBA in the early ’90s, the extent of my service monitoring was my telephone. If it rang a lot with unhappy users at the other end, I knew service was bad. If my phone didn’t ring, I assumed everything was OK. It didn’t take long to realize how foolish this approach was. For one thing, users are never happy when they call about a problem. They’re already mad things aren’t meeting their expectations. This anger rubs off on you, the DBA, by association. You lose credibility.

Instead, why not find out what’s normal for my server and then observe server behavior proactively and quantitatively? As soon as I implemented service monitoring, I found a variety of benefits, including the following:

  1. We can avoid depending on “user experience” as the key indicator for performance. This is especially important when we have specific users who might negatively impact other users (e.g., the manager who pulls resource-intensive reports at a time when everyone else expects good performance for daily data entry).
  2. We can avoid arguments with other IT teams or win those arguments by proving exactly when an issue is related to our SQL Server or is from outside the SQL Server. (We call this “the blame game.”)
  3. We can find problems even when users aren’t on the system. In the best case, we can fix the issue before users ever notice a problem. In the worst case, we can explain the issue is known and a resolution is close at hand. In either case, we retain or even build our credibility within the organization.
  4. We provide “consistency of service.” Most users get angry when we fail to meet their expectations. Part of the problem for many DBAs is they don’t know the proper expectations themselves. With a solid baseline supplemented by multiple benchmark measurements, DBAs can ensure both they and their users have consistent service and well-managed expectations.
  5. We become familiar with various types of problems arising even when the server is behaving normally. For example, we might learn about some application design problems requiring a response on our part—frequent and long blocking chains, for example. In some cases, you also discover problems that, ironically, have no response or can be safely ignored, like intermittent and brief network hiccups.

In this whitepaper, you’ll get a crash course in building a database performance baseline and how to collect one or more benchmarks, how to incorporate your findings into service monitoring, and how SolarWinds® SQL Sentry® takes all the labor out of the ordeal.

Table of Contents

Constructing Your First Baseline

In this section, we’ll walk through the steps to construct your own baseline and the most important benchmarks you’ll need to proactively manage your database systems.

The first bit of advice is to always work from a project plan, complete with a timeline, deadlines, and assigned roles. The test plan should essentially be a paper representation of everything the performance baseline or benchmark will encompass. It should walk through the test in step-by-step detail and describe the following:

  1. When and how the test environment is restored to initial state.
  2. When and how the workload is invoked and scaled.
  3. When and how the performance metrics are collected and stored.
  4. How long the test should run and when the test is considered complete.
  5. How to respond to failures or new test runs.
  6. Who is the test lead and who makes the call to abort a long-running test; this is especially important when a test fails.

I can tell you from hard-won experience that many weeks of hard work were ruined because the person or team performing the benchmark failed to prepare properly. Such mistakes were caused by forgetting a cardinal rule of benchmarks: a benchmark must produce reliable and repeatable results so we can foster rational and actionable conclusions.

Keeping the “reliable and repeatable” mantra in mind necessitates a few extra steps, especially in benchmarks.

The goal of a baseline is to understand and measure server performance under normal conditions, while the goal of a benchmark is to understand and measure server performance under a specific, predefined workload. Does the application eat up all the CPU cycles made available to it? Does it need so much memory that paging occurs? Does the application code thrash the TempDB database? Are all the user requests serviced quickly? After you understand how the application acts under normal load, you’ll be able to answer these questions.

If you’ve never done any of this before, you’ll want at least three distinct measurements for a SUT:

  1. A “normal” workload (the baseline and reference point for all further benchmark tests).
  2. A high watermark.
  3. The peak maintenance workload.

This last measurement is often overlooked by administrative staff, who simply trust all backups, index rebuilds, and DBCC checks can finish in time and with minimal interruptions. But as I said before, if you don’t know what’s normal for these workloads, how can you tell when it’s abnormal?

Back to TOC

Five Steps to Running a Successful Baseline/Benchmark Test

In the following section, we’ll look at the steps to run a successful baseline or benchmark test. There are a couple of hallmarks of a successful database performance baseline or benchmark test:

  1. The test is managed like a project.
  2. All aspects of the SUT are fully documented.
  3. The purpose and business requirements are understood and well-documented.
  4. The collected metrics are analyzed and reincorporated into monitoring and alerting for the system.

The following steps, as shown in Figure 1, help ensure your success.

5-steps-successful-baseline

Figure 1: Five steps to run a successful baseline/benchmark test.

Step 1: Define Requirements and Specifications

When gathering a baseline or running a benchmark test, it’s important to clearly define the requirements in terms of functionality, performance, and scalability. Of course, your baseline is the set of metrics for the server under a “normal” workload. However, by restating the requirements explicitly, you ensure any assumptions built into the application or administrative approach are corrected (or at least documented).

Business requirements for the baseline/benchmark provide the DBA context for the expectations of their customers, the end users.

As a first step, define the business requirements in nontechnical terms. Some examples might be as follows:

  1. “The system can’t be down for more than 45 minutes.”
  2. “The report must return results within three seconds under a concurrent user load of up to 120 users.”
  3. “The system must sustain 500 batch requests per second without blocking.”
  4. “The data must be kept live for three months.”
  5. “The backups must be stored on the server for seven days.”

This process of defining requirements frequently helps the DBA understand the true needs of their customers (i.e., the end users) and provides important context for the technical requirements expected of their systems.

When conducting “what-if” benchmarks, you should only ever alter one component or setting at a time.

However, many benchmarks are intended to prove the feasibility or benefit of a proposed architecture. For example, the DBA may have proposed enabling the SQL Server database compression features on an important production application. How will this affect performance? Only a careful benchmark test can reveal what’s in store. However, a key to successful “what-if” benchmarking is to alter only one important component or setting at a time. If more than one thing is changed, how can you tell which one contributed to new measurements? Did one change help by +4x but the other change hurt by -2x, leading to a seemingly acceptable +2x in improvement?

Back to TOC

Step 2: Construct a Workload

When building your first baseline, you may not need any more workload than what’s already available on the production application. However, collecting a production workload and then replaying it on a test server is easier said than done, especially when factoring in normal user latencies like keying time and think time. Figure 2 offers a couple shortcuts that are often good enough, especially for “what-if” benchmarks.

constructing-workload

Figure 2: Constructing a workload.

Of course, the best database benchmarks are those capable of accurately and reliably reflecting the applications and configuration of your own database infrastructure. But the amount of work it takes to extract your own transactional workload can be immense. The easier route is to learn and run industry standard benchmarks, such as those of the TPC, which reasonably simulate the workload of your SUT and extrapolate the test results for your environments.

A Good Enough Workload With HammerDB

My current favorite tool for performing TPC-like benchmarks is HammerDB, shown in Figure 3. HammerDB is a free, open-source load testing tool capable of running a workload based upon the TPC-C and TPC-H benchmarks on several different database platforms. It allows you to simulate heavy transaction processing and business intelligence workloads.

HammerDB is especially useful to understand the performance implication of enabling or disabling specific OS or database features, alternative hardware choices, or VM architectures. But because it’s a canned workload, you must take pains to study it so you can effectively extrapolate the possibly great difference between its workload and your own workload. Your mileage may vary, as they say.

hammer-DB-testing

Figure 3: HammerDB testing a TPC-C workload on Microsoft SQL Server.

HammerDB is distributed via GitHub and is usually updated several times per year by the all-volunteer development team. It’s easy to learn and can drive heavy workloads against your database servers with minimal configuration or setup. It’s possible to modify the workload HammerDB drives, so if you want to drive a heavy workload entirely of your own creation, it’s possible to do, assuming you can write SQL code. If you encounter difficulties, HammerDB features great documentation and an active discussion forum where the developers are helpful and respond quickly to questions.

Better Workloads With Custom Apps

If you wish to test in-house created application workloads, your best compromise between speed of execution and realism is often to enlist one or more developers to create a variation of the client application designed to generate a randomized but realistic workload. It doesn’t need to be pretty or fancy. It just needs to be a scalable and bug-free way to generate a real SQL Server workload against the database under test using the same sort of queries and transactions real users would generate.

Best Workloads: Real User Activity Replayed With the SQL Server Database Experimentation Assistant (DEA)

In SQL Server 2012, Microsoft added the Distributed Replay Utility to run a multi-threaded workload against SQL Server to benchmark the impact of an upgrade from an older version of SQL Server to the most recent version. However, with the addition of the Database Experimentation Assistant (DEA), it also offers a fully scalable solution to simulate mission-critical SQL Server workloads, conduct performance testing, and undertake “what-if” tests and capacity planning.

dea-overview-compare-solution-architecture

Figure 4: DEA from Microsoft Docs.

DEA is composed of four main components, as shown in Figure 4. Though it’s a no-cost tool, it can be complex to work with and has several prerequisites. The DReplay controller and DReplay clients make up the first two components, controlling and generating the workload and pushing it to the target servers. The target servers—where the workload is replayed—and the Analysis Server—where benchmarking analysis is performed—make up the last two components. Since DEA uses trace files for its workload, you’ll also need to know how to create or capture SQL Server traces.

Back to TOC

Step 3: Isolate and Document

It seems like an obvious requirement, but the SUT should be isolated from other production, development, or test environments. Sure, it sounds easy enough. But in today’s world of consolidated disk storage and consolidated servers—not to mention virtual machines—this can be expensive and difficult.

If possible, you want your benchmarking environment to share no components with other active systems. As you can guess, this need for isolation is one of the most expensive elements of a good baseline or benchmark. (The other expense is the hours invested in the project.) Imagine, for example, you were benchmarking on a computer also serving as a user’s workstation. How could you be certain a specific performance anomaly wasn’t actually the user’s Outlook® application synchronizing with the Exchange server? This sort of shared environment can completely invalidate a benchmark.

Other elements of isolation usually necessary are as follows:

  1. Separate servers to generate a workload against the benchmarked environment.
  2. Separate servers to collect and store performance metrics.
isolate-and-document-system-under-test

Figure 5: Isolate and document the SUT.

I’ve seen many teams attempt to analyze their benchmark results only to discover they have no idea what the Windows® registry and other settings are for the server they were testing. Did the server enable Lock Pages in Memory? Was the NIC set with Jumbo Frames? Which CPU power plan was the server using? Clearly, it’s critical to fully document the SUT. Be sure to capture all the following bits of information in a document (this is also shown in Figure 5):

  1. Windows configuration details, including the Windows Registry.
  2. Hardware configuration details, including memory, storage subsystem, CPU (including power settings and hyperthreading), NIC settings, and HBA settings.
  3. SQL Server configuration settings at the instance, database, and connection levels. Also, be sure to catalog the sp_configure settings and the sp_dboptions settings for the user database and system databases.
  4. The database schema, including index statistics and metadata.
  5. The SQL transactional workload, including validation of the SQL transactional workload for the test at hand. I’ve encountered many teams who, upon analysis of their benchmarks, realized they were running boatloads of SELECT statements for a benchmark of write speed. Whoops!
  6. Full descriptions of all of the metrics and instrumentation used to monitor the benchmark. Some tests can show different results simply because the second test run gathered different performance metrics with a lighter overhead.
You may find the Microsoft Assessment and Planning (MAP) Toolkit helpful in rapidly collecting and synthesizing a strong core of documentation about your SQL Server and Windows Server infrastructure.

Back to TOC

Step 4A: Run the Test

If you constructed the workload properly, running the test itself is pretty easy, as shown in Figure 6.

tips-successful-baseline-benchmark-test-run

Figure 6: Tips for a successful baseline/benchmark test run.

There are a couple of tricky parts for a novice. First, a rather long and unpleasant learning experience for me was figuring out an effective way to quickly refresh the entire test environment. In those days, we had to restore a rather large database, reboot all the servers, and reinitialize our monitoring instruments. Though you can still use this approach, there are newer and faster alternatives, such as database snapshots and—my current favorite—restoring a VM and/or VM snapshots.

Don’t be afraid to abort the test if things go awry early on. And, as mentioned earlier, only change one aspect of the SUT per test iteration.

Back to TOC

Step 4B: Monitoring While the Baseline/Benchmark Test Is Running

You must determine which server you’ll use to monitor your SQL Server and the location of any monitoring log files. The best practice is to monitor the SUT remotely, so it isn’t encumbered by the overhead of collecting metrics and/or writing to the same I/O subsystem used by SQL Server.

Because of these performance concerns, the use of performance counters may need a bit of tuning. You’ll need to carefully evaluate the number of metrics and polling frequency offering the best mix of insight versus overhead. For the initial baseline, however, it’s recommended to use as many counters as desired with the highest frequency available. Subsequent tests can collect fewer metrics and/or at a lower polling frequency. Once you have some experience evaluating the resulting data, you’ll probably begin to limit the collections. In the absence of any other recommendations, collect the following throughout the baseline or benchmark test:

  1. The most salient PerfMon counters detail overall server resource utilization with a tight polling frequency between every three to 30 seconds, depending on your needs. Refer to Appendix A for a list of suggested PerfMon objects and counters when not using SolarWinds SQL Sentry.
  2. The top twenty non-system wait statistics showing SQL Server behavior, using the sys.dm_os_wait_stats dynamic management view (DMV), every one to five minutes. Don’t forget to leave out benign queue and timer waits, as illustrated by Paul Randal in “SQL Server Wait Statistics (Or Please Tell Me Where It Hurts…).” For a quick list of some important wait statistics, refer to Appendix B.
  3. In some cases, depending on the goal of the benchmark, additional DMVs are helpful, especially when correlating their results against wait statistics and PerfMon counters. For example, you may also choose to query every few minutes from the following:
    1. sys.xxx_memory_cache_xxx, sys.xxx_clerks, sys.dm_os_process_memory for all things memory-related.
    2. sys.dm_exec_xxx_stats for individual query, procedure, and trigger usage statistics.
    3. sys.dm_os_threads to figure out thread memory consumption.
    4. sys.dm_exec_requests to gauge blocking and wait times and reasons for specific queries.
Don’t forget about TempDB! TempDB is a major source of resource consumption and contention on many multi-database systems. Collect information about TempDB performance as you would any other production database when you test.

Monitoring Best Practices With Native Tools

  1. If performing a high watermark benchmark, you may wish to run any special SQL Server monitoring queries using a Dedicated Admin Connection. Many restrictions apply.
  2. When testing a SQL Server upgrade, monitor the PerfMon counter SQL Server: >Deprecated Features – Usage.
  3. When monitoring overhead is high, you can do the following:
    1. Minimize the number of counters you’re monitoring. Less monitoring equals less overhead.
    2. Record monitoring logs on a different machine than the SUT.
    3. Adjust the polling frequency when overhead is high. You get less frequent data points but a commensurate reduction in overhead.
    4. Don’t monitor more metrics than necessary.
    5. When experiencing extreme overhead, send logs directly to disk without watching the metrics, since displaying the metrics can add latency.
    6. Don’t use Profiler at all. Only use server-side traces when necessary, with filters in places to limit the trace to the database under test and TempDB.

Back to TOC

Step 5: Assess and Interpret

Long experience at assessing and interpreting the results of baselines has taught me not to rely on large tables of data with many data points. The mind glazes over quickly. Don’t throw away the source data held in the monitoring log files. Instead, use Microsoft Excel® to load the monitoring log data into a single graphic representation you can understand. It provides enough information to interpret the results and makes it easy for you to understand the overall trend of the performance metrics.

Write a summary document to record your findings!

The summary document is the second most important product of a baseline or benchmark test. For baselines, the document should summarize things like common statistical values for important metrics, including minimum, maximum, average, median, and mode values. For benchmarks, the document should also summarize any important extrapolations and trends learned in the test. For example, in a benchmark to see how the current server performs with database compression enabled, the document might summarize that for every X amount of buffer used with compression enabled, CPU utilization increases by Y percent.

The summary document should also consider the impact of important system processes. For example, the SUT might be able to sustain a higher number of concurrent users except during the preventative maintenance cycle due to resources required for processes like backups, DBCC processes, or index maintenance. Similarly, the document should consider important business cycles, such as the end of month accounting close or an important peak usage time.

Back to TOC

Step 6: After Testing

Many IT teams make the mistake of thinking about their baselines and benchmarks as self-contained artifacts. However, you’ll find the real value of both baselines and benchmarks when their findings are used to fortify monitoring and alerting for your SQL Server instances. Your baseline provides context for alerting and reacting to problems.

If you haven’t already done so, it’s urgent for you to implement an ongoing, 24/7 database monitoring and alerting system based on your findings. Find out how SolarWinds SQL Sentry can help you today!

Back to TOC

Baselining in SQL Sentry

After spending all this time describing how to collect a baseline and how to run a performance benchmark, here’s some wonderful news: if you use SQL Sentry, you don’t have to do either one. SQL Sentry comes with built-in temporal baselines visible in the upper right corner of the dashboard, as shown in Figure 7.

sentryone-performance-analysis-dashboard-baselines-dropdown-menu

Figure 7: Temporal baseline drop-down menu on the SQL Sentry dashboard.

Typically, you can choose between one of three built-in baselines. It’s a smart interface, however, and the built-in baselines available from the list will change depending on the time range of the active date range (i.e., the Range Start and Range End in the upper-left corner of Figure 7). For example, if you’re viewing a three-day range of performance data in the dashboard, the Previous Day doesn't make sense to display, so we don’t provide the option. Also, the Range option is currently only available when viewing detail data (range <= 30 minutes). Otherwise, the Average option will be displayed as an overlay in the dashboard.

In Figure 8, you can see a Previous Day baseline with a Range style overlay on the SQL Sentry Dashboard. You can see patterns emerge on several charts, with spikes in CPU, wait statistics, and database I/O just after 8:00 a.m. The dashed lines and shaded area within, best illustrated in the Log Flushes graph on the middle bottom right of the dashboard, present a moving average showing the min/max of normal values for this time of day. As it turns out, the spikes in waits, CPU, etc. are well within normal behavior for this time of day. Of course, normal doesn't necessarily mean acceptable. This functionality is simply not possible with the native Microsoft tools.

sentryone-performance-analysis-dashboard-baseline-range-example

Figure 8: SQL Sentry dashboard with previous day temporal baseline overlay.

When looking at longer time ranges—or when you choose it by selecting the Average radio button in the upper-right corner—the SQL Sentry dashboard will display the average value in each graph as a dashed line. As is normal for SQL Sentry, you can drill deeper into the interface or Jump To other sections of the interface, such as the Top SQL or Disk Activity tabs.

Custom Baselines

In the database industry, a benchmark is a test in which you collect the performance metrics of a system under a specific, predefined workload. SQL Sentry doesn’t have a workload generation feature built in, but what could be a better indicator of performance under load than to measure specific, predefined workloads? In other words, benchmarks in the general industry equal custom baselines in SQL Sentry.

In addition to the built-in temporal baselines, you can create your own custom baselines specifically calibrated against specific time periods and workloads. To do so, simply click and drag to select a date range on any of the charts, right-click to invoke the context menu, and then select the Create Baseline menu option, as shown in Figure 9.

sentryone-performance-analysis-dashboard-create-baseline-2021

Figure 9: Creating a custom baseline.

When you click the Create Baseline menu option, the Edit Baseline screen appears. This shows the Average, Min, Max, and Standard Deviation for each dashboard metric over the specified time range, as shown in Figure 10.

Here, we’ll call this custom baseline "Business Hours" to represent normal and acceptable performance during this time. You can also select the checkbox in the Default column to choose which metrics are shown by default on multi-metric charts. Additionally, you can manually adjust the baseline value by editing a specific value in the Average column. For example, you might decide the average value for Page Life Expectancy (17,423.0) isn’t representative and simply change it to your preferred value (just not 300, please). Clicking OK closes the dialog box and saves any changes.

SQL Sentry Custom Conditions

Figure 10: Edit Baseline dialog box.

Once saved, the baseline is added to the Baseline drop-down list, as you can see in Figure 11, where the “Business Hours” custom baseline appears. When you select your new custom baseline, a horizontal line appears on each chart showing the baseline value for each metric. The line provides a reference point so you can easily spot unusual activity.

In Figure 11, you can also see at around 10:10 a.m., something started impacting disk waits, CPU, PLE, and page reads, causing them to deviate from the normal business hours level. Time to investigate!

SQL Sentry Custom Conditions

Figure 11: Displaying a custom baseline on the SQL Sentry dashboard.

From here, you can highlight the range and Jump To other areas like Top SQL, Processes, or the Event Calendar to diagnose the root cause.

There are many other use cases like this for custom baselines. Here are a few:

  1. Create a baseline prior to making a configuration change on the server, then use it to immediately monitor how the change impacts performance.
  2. Create a baseline for intensive maintenance operations like integrity checks, index rebuilds, or backups and periodically review the operation against its baseline to ensure it’s still performing as expected. I've seen many times where a change to the network or the SAN by some other group can have a dramatic impact on maintenance ops, and this is a great way to catch this.
  3. Create periodic date-based baselines, such as “Jan 2021,” “Mar 2021,” etc., and use them to assess how performance changes over time.

You can create as many custom baselines as you want and edit them at any time by clicking the build button to the right of the drop-down menu. For multi-metric charts, you can change the default metric shown without editing the baseline by right-clicking it in the legend.

Back to TOC

Integrating Baselines and Alerting

These visualization capabilities are great, but the feature wouldn't be complete without the ability to generate alerts using baseline metrics. This is where the other SQL Sentry functionality comes in. When a custom baseline is created at the Windows Server or SQL Server level, it becomes available for use in Advisory Conditions at that level.

Using baselines in an Advisory Condition is easy. Instead of setting an explicit threshold value on the right side of a comparison, select Performance Counter Baseline, and its value for the metric is automatically substituted. This way, if a baseline value ever changes, you won't need to update any conditions.

In the Advisory Condition shown in Figure 12, we've added three separate baseline tests with some basic math to control the amount of deviation from the baseline value required to trigger the alert.

Custom Condition Builder Using Baseline Values

Figure 12: Creating an Advisory Condition based on the Business Hours custom baseline.

The system knows to use the Business Hours baseline value for the counter on the left side of the comparison. Basically, the interface uses context-aware shorthand that will trigger the alert whenever Total SQL Server Waits are three or more times greater the baseline, when PLE is half the baseline, or when Total Server Processor Time is twice the baseline.

(Note: it’s possible to select a baseline on the left side, but you'll forego the shorthand when doing so. Since it generally reads easier with the counter on the left and threshold on the right, we recommend this approach anyway.)

Back to TOC

Baseline and Benchmark Testing Best Practices With the Native Tools

No matter the size of your SQL Server estate, remember you’ll benefit strongly from baselines and the contextual information they provide about your servers.

  • Collect a baseline of metrics so you can see how your server acts under normal conditions.
  • Perform at least two more performance benchmarks:
    • One to determine the high watermark for the server’s performance.
    • And another to see the server’s behavior during peak preventative maintenance activity.
  • Write a detailed test plan and conduct the test like a well-managed project.
  • Stabilize and fully document the SUT, from the server all the way through to the application.
  • Build a true database workload and make sure it’s as realistic as possible.
  • Run and monitor the test from other hosts. Record and store the monitoring logs apart from the SUT.
  • Collect only the metrics necessary for the test at a polling frequency that doesn’t incur a lot of overhead.
  • Analyze and document your findings in a summary document. Use charts and other data visualizations to drive home any lessons learned.
  • Incorporate the findings of a baseline or benchmark test into your service monitoring regimen. If you don’t have one, implement one using the native tools immediately.

Back to TOC

Conclusion

A baseline shows you what SQL Server looks like when it’s acting normally. It provides you with an empirical context for building a comprehensive service monitoring regimen. This enables you to provide exceptional levels of customer service, uptime, and proactive management for your SQL Server infrastructure. Without a baseline, you risk many potential problems with your SQL Server instances, including the following:

  • Having no idea about whether a particular metric is meaningful or useless for a given SQL Server and application.
  • A lack of understanding of whether your server’s behavior is keeping with its normal trend.
  • Depending upon user experience to know when a SQL Server instance is having a problem, which can result in a loss of credibility and personal prestige.
  • Losing “the blame game” to other departments of the IT organization.
  • Providing inconsistent service to customers.
  • Completely missing a problem or issue caused by an unusual surplus of capacity. For example, it might indicate a service, process, or group of users can’t get to the server to do their normal work.

If it’s normal for your server to be heavy utilized and then, suddenly, it’s lightly utilized one day, you can tell something is out of the ordinary. Perhaps the network is having trouble and the users—your customers—aren’t connecting to your server for some reason.

In contrast, a benchmark shows you SQL Server performance under a specific, predefined workload. This can help you determine the SQL Server instance’s maximum performance capabilities and the specific limitations of its capabilities, such as the maximum number of users it can concurrently support.

SQL Sentry turns the long and arduous process of creating baselines and benchmark measurements into a simple process.

Though it’s possible to incorporate any baseline or benchmark findings into the alerting of the SQL Server native tools, it requires a great deal of manual work, Transact-SQL coding, and possibly application coding. Plus, using the SQL Server native tools requires you to manually conduct a database performance baseline or benchmark test, assess the findings, and manually build out an entire set of alerts based upon those findings. Using SQL Sentry, you can automatically collect real system performance measurements, turn them into baselines, and reference them in Advisory Conditions to trigger the appropriate alerts. This helps you make sure you’re only alerted to truly significant situations important to the monitored host.

SQL Sentry doesn’t just provide built-in temporal baselines—it also provides the ability to create custom baselines (in essence, specific performance benchmarks). Additionally, SQL Sentry baseline metrics can be used as the basis for all your alerting. Though other alerting systems struggle with these problems, SQL Sentry is designed to handle them all from the ground up—quickly, directly, and easily. SQL Sentry also makes it easy to scale out alerting throughout your IT enterprise by enabling you to set baseline-driven alerts for as many or as few servers in your global IT environment as you’d like.

For more details about the advanced alerting and baselining capabilities in SQL Sentry, refer to the baseline blog posts here.

Finally, for a free 14-day trial of SQL Sentry, click here.

Back to TOC

Appendix A: PerfMon Objects and Counters Useful for SQL Server Baselining and Benchmarking

When configuring PerfMon for a baseline or benchmark test, consider using the following objects and counters. By default, set the PerfMon session to record the log file to a location other than the system under test. Set the polling frequency to be as low as is acceptable, keeping in mind the added overhead created by a frequent polling interval. A typical polling interval might be anywhere from every three seconds to 30 seconds (or even every minute) under some baselines or benchmark tests.

Object Counter Rule of Thumb Notes
Paging File % Usage < 70% The amount of page file currently in use. High page file usage indicates memory shortages so severe they must be spilled to the I/O subsystem, resulting in a serious performance bottleneck.
Processor % Processor Time <= 80% The amount of total CPU in use across all system activities. The higher this metric is, the more likely users will be delayed.
Processor % Privilege Time < 40% of % Processor Time The amount of time spent executing kernel commands (i.e., real work) like SQL Server I/O requests. This value should seldom rise above 40% of total % Processor Time.
Process (sqlservr) Process (msmdsrv) % Processor Time < 80% This is the amount of CPU used solely by SQL Server (sqlservr) and/or Analysis Services (msmdsrv) process threads. It’s even more important to monitor these metrics if the server is running both services simultaneously. In dedicated servers, you may monitor the single service in use on the machine.
System Processor Queue Length <2 This is the single best indicator of CPU pressure on the Windows Server. It shows how many threads are currently waiting on an answer by a CPU on the server. Less than 12 per CPU is good/fair, less than eight is better, and less than two is best.
System Context Switches/Sec <1500 This metric shows the number of execution contexts switched in the last second. High-end hardware can now handle high amounts of context switching, but in the absence of any other indicators, greater than 6,000 is a busy system, less than 6,000 is fair, less than 3,000 is good, and less than 1,500 is excellent.
Physical Disk Avg ms/read <8 This metric indicates the time, in milliseconds, spent waiting to complete a read operation on average. It’s also known as read latency. The most common physical bottleneck is usually the I/O subsystem. The Microsoft guidance for HDD-based systems says a read latency of greater than 20 is poor, less than 20 is good/fair, less than 12 is better, and less than eight is best.
Physical Disk Avg ms/write <1 This metric indicates the time, in milliseconds, spent waiting to complete a write operation on average. It’s also known as write latency. As with the previous metric, the Microsoft guidance for HDD-based systems says a write latency of greater than four is poor, less than four is fair, less than two is better, and one or less is best on systems with a write-cache. On I/O subsystems without a write-cache, a write latency greater than 20 is poor, less than 20 is fair, less than 12 is better, and eight or less is best.
Memory Available MBytes =>300 An indicator showing the amount of physical memory, in megabytes, available to run processes on the machine. You generally want a few hundred MB available to handle the random file copy or similar activity needed for system administration.
SQL Server: Memory Manager Memory Grants Pending ~0 This counter shows the current number of processes waiting for a workspace memory grant within SQL Server. You want this value to be close to zero most of the time. If it’s frequently greater than zero, then SQL Server needs more memory or is running T-SQL code wasteful of memory.
SQL Server: Buffer Manager Free List Stalls/Sec < 2 This counter measures the frequency at which requests for data buffer pages are suspended (aka “stalled”) because there are no data cache buffers. If this metric is greater than two, SQL Server needs more memory or is running T-SQL code wasteful of memory.
SQL Server: Databases Log Flush Waits/Sec ~0 This counter effectively shows the number of times per second SQL Server must wait for pages to be written to the transaction log. Only highly stressed systems should have a sustained value greater than zero.
SQL Server: General Statistics User Connections Proportionate to Batch Requests/Sec This metric shows the number of users currently connected to the SQL Server. For most workloads, it should roughly track with Batch Requests/Sec, rising and falling together. Taken with Batch Requests/Sec, it’s a good indicator of the overall busyness of a SQL Server.
SQL Server: Buffer Manager Stolen Pages/Sec Proportionate to Batch Requests/Sec This metric shows the number of 8k pages “stolen” from the data cache to satisfy other memory needs, like plan cache and workspace memory. It’s a good indicator of the rate of data flow among SQL Server caches and should remain proportionate to Batch Requests/Sec.
SQL Server: SQL Statistics Batch Requests/Sec Proportionate to Workload This shows the overall busyness of a SQL Server (i.e., the number of batch requests the database engine receives per second). High-end servers running modern versions of SQL Server can sustain tens or even hundreds of thousands of batch requests per second on a well-tuned workload. (Note: this counter doesn’t track all Hekaton transactions).
SQL Server: SQL Statistics SQL Server: SQL Statistics < 10% of Batch Requests/Sec This indicator shows the number of times T-SQL compilations occurred per second, including recompiles. In general, for OLTP workloads, this metric should remain proportionate to the number of Batch Requests/Sec. High values may indicate excessive ad hoc querying due to poorly coded T-SQL.
SQL Server: SQL Statistics SQL Re-Compilations/Sec < 10% of SQL Compilations/Sec This indicator shows the number of times per second T-SQL compiled code attempted to execute but had to be recompiled before completion. This number is preferably at or near zero, since recompiles can cause deadlocks and exclusive compile locks. For most workloads, it should remain in proportion to Batch Requests/Sec and SQL Compilations/Sec.

This list is for quick reference only. For a comprehensive list of performance metrics used by SQL Sentry, check out this SQL Sentry documentation article.

Appendix B: Top Wait Statistics Indicative of Performance Issues

The following wait statistics may be useful when creating a baseline or running a benchmark test. You need only collect the appropriate data from the sys.dm_os_wait_stats DMV while running your tests. You should also be certain to collect the time the data is collected, since wait statistics are cumulative and can’t give you a moment-by-moment record of their growth over time. As with collecting PerfMon objects and counters during a baseline or benchmark test, record the results to a location not on the system under test. In addition, recognize the polling frequency adds overhead. Collect only what you need at a somewhat lower frequency than you might with PerfMon object counters—every one to 15 minutes, for example. Here are a few details about the top wait statistics:

CPU Pressure

  • SOS_SCHEDULER_YIELD: Indicates waits caused by CPU pressure due to frequently yielding a scheduler among threads.
  • CXPACKET: Represents the amount of wait time consumed by parallelized operations. This isn’t usually an indicator of “bad” behavior, although the default choice in SQL Server is to aggressively parallelize operations.

Locking

  • LCK_X, LCK_M_U, and LCK_M_X: Shows the time spent waiting on long-term blocking. Often an indicator of poorly coded T-SQL, transactions, or isolation levels.

Memory

  • PAGELATCH_X: Shows the time spent waiting for a data buffer latch. A high value may indicate a shortage of memory for the SQL Server data buffer.
  • LATCH_X: Represents the time spent waiting for a non-data buffer latch, such as in the plan cache. A high value may indicate a shortage of memory for the SQL Server non-data buffer.
  • RESOURCE_SEMAPHORE: Shows the time spent waiting for a memory grant from Windows Server to SQL Server. A high value may indicate other processes on the server also need memory and are “stealing” it back from SQL Server.

I/O

  • PAGEIOLATCH_X: This is the time spent waiting for data to be loaded from the I/O subsystem to the SQL Server data cache buffers. High values may indicate either a slow I/O subsystem or inadequate memory to hold the amount of data needed by the workload.
  • WRITELOG and LOGBUFFER: This is the amount of time spent managing the transaction log I/O subsystem. High values in either counter may indicate the SQL Server database transaction log is on a slow or overworked I/O subsystem.
  • ASYNC_IO_COMPLETION and IO_COMPLETION: These wait statistics show the amount of time spent waiting for general I/O operations to complete. High values may indicate a slow or overworked I/O subsystem.

Network Pressure

  • ASYNC_NETWORK_IO: Indicates the overall time SQL Server spends waiting for asynchronous network I/O, the most commonly used sort of network I/O. High values may indicate an inadequate or overused NIC.

Back to TOC