The ability to conduct a database performance test to establish baseline measurements and workload performance benchmarks is one of the skills that strongly distinguishes the senior DBAs (database administrator) from the junior DBAs. Once a DBA is proficient in baselining and benchmarking database performance, he or she hates to operate without them because they are so valuable.
First, a quick explanation of the differences between a baseline and a benchmark, plus some more bits of nomenclature.
Just like in the health example, the baseline metric(s) define what is normal for the database application’s performance and, intuitively, we know that we can only do what is abnormal about a SQL Server by first establishing what is normal. For a DBA, few metrics are more valuable or have a more far-reaching impact than a baseline.
The purpose of baselining and benchmarking is that you know, definitively, what is normal and abnormal for your SQL Server. Without that knowledge, you are forced to rely on broad rules of thumb and hunches.
There are a number of reasons that few DBAs master performance baselining and benchmarking. We could say, “It’s hard!” and leave it at that. We can also very safely say that DBAs are so busy coping with the most urgent issues of the day that there is no time left over to conduct a time- and resource-intensive set of tests. Both of those statements are very true, but they don’t go far enough. Instead, let’s take a moment to more fully characterize the most challenging issues with creating baseline metrics and benchmarks.
First, the ability to build baselines and benchmarks requires much knowledge about:
After all, if you’re trying to construct a performance baseline for a given database application, but you change even one aspect of the underlying hardware, you could reasonably expect an entirely different set of values from the baseline.
Second, the DBA must also possess a thorough understanding of:
Moreover, if that were not enough, the DBA also needs to know a good deal about monitoring the system under test (SUT), as well as how to assess the metrics captured by those monitoring tools.
The challenges in collecting a baseline can be daunting! However, the benefits of having a baseline are well worth the time and energy. The baseline isn’t a valuable thing in and of itself. It only gains value when you use it as the basis for service monitoring; that is, observing and assessing the health of the database service in real-time.
When I first started as a DBA in the early ‘90s, the extent of my service monitoring was my telephone. If it rang a lot with unhappy users at the other end, then I knew service was bad. If my phone didn’t ring, I assumed that everything was ok. It did not take long to realize how foolish that approach was. For one thing, users are never happy when they call about a problem. They are already mad that things are not meeting their expectations. That anger rubs off on you, the DBA, by association. You lose credibility.
Instead, why not find out what is normal for my server, and then observe server behavior proactively and quantitatively? As soon as I implemented service monitoring, I found a variety of benefits:
In this white paper, you’ll get a quick crash course in building a database performance baseline and how to collect one or more benchmarks, how to incorporate your findings into service monitoring, and how the SQL Sentry tools take all of the labor out of the ordeal.
In this section, we will take you through the steps to construct your baseline, as well as the most important benchmarks you will need to manage your database systems proactively.
The first bit of advice is to always work from a project plan, complete with a timeline, deadlines, and assigned roles. The test plan should essentially be a paper representation of everything that the performance baseline or benchmark will encompass. It should walk through the test in step-by-step detail describing:
I can tell you from hard-won experience that many weeks of hard work were ruined because the person or team performing the benchmark failed to prepare adequately. Such mistakes were caused by forgetting a cardinal rule of benchmarks: a benchmark must produce results that are both reliable and repeatable so that we can foster conclusions that are rational and actionable.
Keeping the “reliable and repeatable” mantra in mind necessitates a few extra steps, especially in benchmarks.
The goal of the baseline is to understand and measure server performance under normal conditions, while the goal of a benchmark is to understand and measure server performance under a specific, pre-defined workload. Does the application eat up all the CPU cycles made available to it? Does it need so much memory that paging occurs? Does the application code thrash the
If you have never done any of this before, you will want at least three distinct measurements for a SUT:
This last measurement is often overlooked by administrative staff, who simply trust that all backups, index rebuilds, and DBCC checks can finish in time and with minimal interruptions. However, as we said before, if you don’t know what is normal for these workloads, how can you tell when it’s abnormal?
In the following section, we will cover the steps to run a successful baseline or benchmark test. There are a couple of hallmarks of a successful database performance baseline or benchmark test:
The following steps, as shown in Figure 1
Figure 1: Five Steps to Run a Successful Baseline/Benchmark Test
When gathering a baseline or running a benchmark test, it is important to clearly define the requirements in terms of functionality, performance, and scalability. Of course, your baseline is the set of metrics for the server under “normal” workload. However, by restating the requirements explicitly, you ensure that any assumptions built into the application or administrative approach are corrected, or at least documented.
Business requirements for the baseline/benchmark provide the DBA context for the expectations of their customers, the end users.
As a first step, define the business requirements and in non-technical terms. Some examples might be:
This process of requirements definition frequently helps the DBA to understand the real needs of their customers, i.e., the end users, and provides important context for the technical requirements expected of their systems.
When conducting “what if” benchmarks, only ever alter one component or setting at a time.
However, many benchmarks are intended to prove the feasibility or benefit of a proposed architecture. For example, the DBA may have proposed enabling SQL Server’s database compression features on a critical production application. How will that affect performance? Only a careful benchmark test can reveal what is in store. However, a key to successful “what if” benchmarking is to alter only one important component or setting at a time. If more than one thing is changed, how can you tell which one contributed to new measurements? Did one change help by +4x, but the other change hurt by -2x, leading to a seemingly acceptable +2x in improvement?
When building your first baseline, you may not need any more workload
Figure 2: Constructing a Workload
Of course, the best database benchmarks are those that accurately and reliably reflect the applications and configuration of your own database infrastructure. On the other hand, the amount of work that goes into extracting your transactional workload can be immense. The easier route is to learn and run industry-standard benchmarks, such as those of the Transaction Processing Council, which reasonably simulate the workload of your SUT and then extrapolate the test results for your environments.
My current favorite tool for performing TPC-like benchmarks is HammerDB, shown in Figure 3. HammerDB is a free, open-source load testing tool, which can run a workload based upon the TPC-C and TPC-H benchmarks on several different database platforms. It allows you to simulate heavy transaction processing and business intelligence workloads.
HammerDB is especially useful to understand the performance implication of enabling or disabling specific OS or database features, alternative hardware choices or VM architectures. However, because it is a canned workload, you must take pains to study that workload so that you can effectively extrapolate the possibly significant difference between its workload and your own workload. Your mileage may vary, as they say.
Figure 3: HammerDB testing a TPC-C workload on Microsoft SQL Server
HammerDB is distributed via Sourceforge and is usually updated several times per year by the all-volunteer development team. It is easy to learn and can drive very heavy workloads against your database servers with minimal configuration or setup. It is possible to modify the workload HammerDB drives; so, if you want to drive a heavy workload that is
If you wish to test in-house created application workloads, often your best compromise between speed of execution and realism is to enlist one or more developers to create a variation of the client application which generates a randomized but realistic workload. These do not need to be pretty or fancy. They just need to be a scalable and bug-free way to generate real SQL Server workload against the database under test using the same sort of queries and transactions that real users would create.
Starting in SQL Server 2012, Microsoft added the Distributed Replay Utility to run a multi-threaded workload against SQL Server, initially for benchmarking the impact of an upgrade from one version of SQL Server to another. However, it also offers a fully scalable solution to simulate mission-critical SQL Server workloads, conduct performance testing, and undertake “what-if” tests and capacity planning.
Figure 4: Distributed Replay Architecture from MSDN
Distributed Replay is composed of four main components, as shown above in Figure 4 and, while free, it can be complicated to work with. The administration tool and workload controller make up the first two components, while the client(s) that generate the workload and push it to the target server make up the last two components. Since Distributed Replay uses trace files for its workload, you’ll also need to know how to create or capture SQL Server traces.
More information about the Distributed Replay Utility is available:
It seems like an obvious requirement - that the SUT should be isolated from other production, development or test environments. Sure, it sounds easy enough. But in today’s world of consolidated disk storage and consolidated servers, not to mention virtual machines, this can be both expensive and difficult.
If possible, you want your benchmarking environment to share literally no components with other active systems. As you can guess, this need for isolation is one of the most expensive elements of a good baseline or benchmark. (The other expense is the person-hours invested in the project.) Imagine, for example; you were benchmarking on a computer that was also a user’s workstation. How could you be sure that a specific performance anomaly wasn’t actually the user’s Outlook application synchronizing with the Exchange server? That sort of shared environment can completely invalidate a benchmark.
Other elements of isolation that are usually necessary are 1) separate servers to generate a workload against the benchmarked environment, and 2) separate servers to collect and store performance metrics.
Figure 5: Isolate and Document the System under Test (SUT)
I’ve seen many teams attempt to analyze their benchmark results, only to discover that they have no idea what the Windows registry and other settings are for the server they were testing on. Did the server enable Lock Pages in Memory? Was the NIC set with Jumbo Frames? Which power plan was the server using? Clearly, it is critical to fully document the SUT. Be sure to capture all of the following bits of information in a document, also shown above in Figure 4:
Quick Tip! You may find the Microsoft Assessment and Planning (MAP) Toolkit very helpful in rapidly collecting and synthesizing a strong core of documentation about your SQL Server and Windows server infrastructure.
If you constructed the workload properly, running the test itself is pretty easy, as shown below in Figure 5.
Figure 6: Tips for a Successful Baseline / Benchmark Test Run
There are a couple of tricky parts for a novice. First, a rather long and unpleasant learning experience for me, back in the day, was an effective way to refresh the entire test environment quickly. In those days, we had to restore a rather large database, reboot all the servers, and reinitialize our monitoring instruments. While you can still use this approach, there are newer and faster alternatives, such as database snapshots and, my current favorite, restoring a VM and/or VM snapshots.
Don’t be afraid to abort the test if things go awry early on. Also, as mentioned earlier, only change one aspect of the SUT per test iteration.
You must determine which server you will use to monitor your SQL Server, as well as the location of any monitoring log files. The best practice is to monitor the SUT remotely so that it is not encumbered by the overhead of collecting metrics and/or writing to the same I/O subsystem used by SQL Server.
Monitoring with Microsoft native tools has many issues. Read our white paper entitled SQL Server, SharePoint & Windows Alerts to learn more.
Because of these performance concerns, the use of performance counters may need a bit of tuning. You will need to carefully evaluate the number of metrics and polling frequency that offers the best mix of insight versus overhead. For the initial baseline, however, it is recommended that as many counters as desired be used with the highest frequency available. Subsequent tests can collect fewer metrics and/or at a lower polling frequency. Once you have some experience with evaluating the resulting data, you will probably begin to limit the collections. In the absence of any other recommendations, collect the following throughout the baseline or benchmark test:
Quick Tip! Don’t forget about
Long experience in assessing and interpreting the results of baselines has taught me not to rely on large tables of data with lots of data points. The mind glazes over quickly. Don’t throw away the source data held in the monitoring log files. Instead, use Excel to load the monitoring log data into a single graphic representation that makes sense to you. It provides enough information to interpret the results and makes it easy for you to understand the overall trend of the performance metrics.
Write a summary document to record all of your findings!
The Summary Document is the second most important product of a baseline or benchmark test. For baselines, the document should summarize things like common statistical values for important metrics, including min, max, average, median, and mode values. For benchmarks, the document should also summarize any important extrapolations and trends learned in the test. For example, in a benchmark to see how the current server performs with database compression enabled, the document might summarize that for every X amount of buffer used with compression enabled, CPU utilization increases by Y percent.
The Summary Document should also take into account the impact of important system processes. For example, the SUT might be able to sustain a higher number of concurrent users except during the preventative maintenance cycle due to resources required for processes like backups, DBCC processes, or index maintenance. Similarly, the document should take into account important business cycles, such as the end of month accounting close or a critical peak usage time.
Many IT teams make the mistake of thinking about their baselines and benchmarks as self-contained entities. However, you will find the real value of both baselines and benchmarks when their findings are used to fortify monitoring and alerting of your SQL Servers. Your baseline provides a context for alerting and reacting to problems.
If you haven’t already done so, it is urgent that you implement an ongoing, 24x7 monitoring and alerting system based upon your findings. The native tools provide a limited monitoring and alerting capability as described in our white paper SQL Server, SharePoint & Windows Alerts.
After spending all of this time describing how to collect a baseline and how to run a performance benchmark, here is some excellent news: If you use SQL Sentry, you don’t have to do either. SQL Sentry comes with built-in temporal baselines visible in the upper right corner of the Dashboard, starting in v8, as shown below in Figure 6.
Figure 6: SQL Sentry Performance Analysis Dashboard Dropdown
Typically, you can choose between one of three built-in baselines. It is a smart interface, however, and the built-in baselines available from the list will change depending on the time range of the active date range. (That is, the Range Start and Range End on the upper left of Figure 6.) For example, if you are viewing a 3-day range of performance data in the dashboard, the Previous Day doesn't make sense to display, so we don’t provide the option. Also, the Range option is currently only available when viewing detail data (range <= 30 minutes). Otherwise, the Average option will be displayed as an overlay in the dashboard.
In Figure 7, shown below, you can see a Previous Day baseline with a Range style overlay on the Performance Advisor Dashboard. You can see that patterns emerge on several charts with spikes in CPU,
Figure 7: Performance Analysis Dashboard with Previous Day Temporal Baseline Overlay
When looking at longer time ranges, or when you choose it by selecting the Average radio button in the top-right corner, SQL Sentry will display the average value in each graph as a dashed line. As is normal for SQL Sentry Performance Analysis, you may drill deeper into the interface or Jump To other sections of the interface, such as the Top SQL or Disk Activity tabs.
In the database industry, a benchmark is a test in which you collect the performance metrics of a system under a specific, pre-defined workload. Performance Advisor does not build in a workload generation feature, but what could be a better indicator of performance under load than to actually measure specific, pre-defined workloads? In other words, benchmarks in the general industry equal custom baselines in SQL Sentry.
So in addition to the built-in temporal baselines, you can create your own custom baselines that are calibrated explicitly against specific time periods and workloads! To do so, just click and drag to select a date range on any of the charts, right-click to invoke the context menu, and then select the Create Baseline menu option, as shown below in Figure 8.
Figure 8: Creating a Custom Baseline
When you click the Create Baseline menu option, the Edit Baseline screen appears, showing the Average, Min, Max and Standard Deviation for each dashboard metric over the specified time range, as shown below in Figure 9.
Here, we’ll call this custom baseline "Business Hours" to represents normal and acceptable performance during that time. You can also select the checkbox in the Default column to choose which metrics are shown by default on multi-metric charts. You can also manually adjust the baseline value by editing a specific value in the Average column. For example, you might decide that the average value for Page Life Expectancy (17,423.0) is not representative and simply change it to your preferred value (just not 300, please). Clicking OK closes the dialog box and saves any changes.
Figure 9: Edit Baseline Dialog Box
Once saved, the baseline is added to the Baseline dropdown list, as you can see in Figure 10 below, where the “Business Hours” custom baseline appears. When you select your new custom baseline, a horizontal line appears on each chart showing the baseline value for each metric. The line provides a reference point so you can easily spot unusual activity.
In Figure 10, you can also see that at around 10:10 AM something started impacting
Figure 10: Displaying a Custom Baseline on the Performance Analysis Dashboard
From here you can highlight the range and Jump To other areas like Top SQL, Processes or the SQL Sentry Event Manager Calendar to diagnose the root cause.
There are many other use cases like this for custom baselines. Here are a few:
You can create as many custom baselines as you want, and edit them at any time by clicking the build button to the right of the drop-down. For multi-metric charts, you can change the default metric shown without editing the baseline by right-clicking it in the legend.
These visualization capabilities are great, but the feature wouldn't be complete without the ability to generate alerts using baseline metrics. That's where the other new SQL Sentry functionality comes in. When a custom baseline is created at the Windows Server or SQL Server level, it becomes available for use in custom conditions at that level.
Using baselines in a custom condition is easy. Instead of setting an explicit threshold value on the right side of a comparison, select Performance Counter Baseline, and its value for the metric
In the custom condition shown in Figure 11 below, we've added three separate baseline tests with some basic math to control the amount of a deviation from the baseline value required to trigger the alert.
Figure 11: Creating a Custom Condition Based on the Business Hours Custom Baseline
The system knows to use the Business Hours baseline value for the counter on the left side of the comparison. Basically, the interface uses context-aware shorthand that will trigger the alert whenever Total SQL Server Waits are three or more times greater the baseline, or when PLE is half the baseline, or when Total Server Processor Time total is twice the baseline.
(Note: It is possible to select a baseline on the left side. However, you'll forego the shorthand when doing so. Since it generally reads easier with the counter on the left and threshold on the right, we recommend this approach anyways).
No other product provides as rich an alerting interface, nor as graceful a method of deriving and incorporating baseline values into your alerting system.
No matter the size of your SQL Server estate, remember that you’ll benefit strongly from baselines and the contextual information they provide about your servers.
A baseline shows you what SQL Server looks like when it is acting normally. It provides you with an empirical context for building a comprehensive service-monitoring regimen which, in turn, enables you to provide exceptional levels of customer service, uptime, and proactive management of your SQL Server infrastructure. Without a baseline, you risk many potential problems with your SQL Servers:
So if it is normal for your server to be heavily utilized and then, suddenly, it is lightly utilized one day, you can tell that something is out of the ordinary. Perhaps the network is having trouble and the users, your customers, are not connecting to your server for some reason.
In contrast, a benchmark shows you SQL Server performance under a specific, pre-defined workload. This can help you determine the SQL Server’s maximum performance capabilities and specific limitations of its capabilities, such as the maximum number of users it can concurrently support.
SQL Sentry turns the long and arduous process of creating baselines and benchmark measurements into a simple 3-click process.
While it is possible to incorporate any baseline or benchmark findings into the alerting of the SQL Server native tools, it requires a great deal of manual work, Transact-SQL coding, and possibly application coding as well. Plus, using the SQL Server native tools requires you to conduct a database performance baseline or benchmark test manually, assess the findings, and then manually build out an entire set of alerts based upon those findings. Using Performance Advisor, you can automatically collect real system performance measurements, turn them into baselines, and then reference them in custom conditions to trigger the appropriate alerts. That way, you’re sure that you’re only alerted to situations which are indeed significant and important to the monitored host.
Not only does SentryOne SQL Sentry provide built-in temporal baselines, but it also provides the ability to create custom baselines, in essence, specific performance benchmarks. Besides, SQL Sentry Performance Analysis baseline metrics may be used as the basis for all of your alerting. So while other alerting systems struggle with these problems, Performance Advisor is designed to handle them all from the ground up - quickly, directly, and easily. SQL Sentry also makes it easy to scale out alerting throughout your IT enterprise by enabling you to set baseline-driven alerts for as many or as few servers in your global IT environment as you like.
For more information about general alerting and monitoring strategies, refer to the MSDN Monitoring Best Practices article at https://msdn.microsoft.com/en-us/library/windows/desktop/bb226833(v=vs.85).aspx.
For more details on SQL Sentry advanced alerting and baselining, refer to the blogs of SentryOne Founder and CTO Greg Gonzalez at http://blogs.sentryone.com/category/baselines/.
Finally, you can download a free, 15-day trial of SQL Sentry, which is part of the all-in-one SentryOne Platform suite, here:
When configuring PerfMon for a baseline or benchmark test, consider using the following objects and counters. By default, set the PerfMon session to record the log file to a location other than the system under test. Set the polling frequency to be as low as is acceptable keeping in mind the added overhead created by a frequent polling interval. A typical polling interval might be anywhere from every 3 seconds to 30 seconds, or even every minute, under some baselines or benchmark tests.
|Object||Counter||Rule of Thumb||Notes|
|Paging File||% Usage||< 70%||The amount of page file currently in use. High page file usage indicates memory shortages that are so severe they must be spilled to the I/O subsystem, resulting in a serious performance bottleneck.|
|Processor||% Processor Time||<= 80%||The amount of total CPU in use across all system activities. The higher this metric is, the more likely users will be delayed.|
|Processor||% Privilege Time||< 40% of % Processor Time||The amount of time spent executing kernel commands (i.e. real work) like SQL Server IO requests. This value should seldom rise above 40% of total %Processor Time.|
||% Processor Time||< 80%||This is the amount of CPU used solely by SQL Server (
|System||Processor Queue Length||< 2||This is the single best indicator of CPU pressure on the Windows Server. It shows how many threads are currently waiting on an answer by a CPU on the server. Less than 12 per CPU is good/fair, less than 8 is better, and less than 2 is best|
|System||Context Switches/sec||< 1500||This metric shows the number of execution contexts switched in the last second. High-end hardware can now handle very high amounts of context switching, but in the absence of any other indicators greater than 6000 is a busy system, less than 6000 is fair, less than 3000 is good, and less than 1500 is excellent.|
|Physical Disk||Avg ms/read||< 8||This is metric indicates the time, in milliseconds, spent waiting to complete a read operation on average. It is also known as
|Physical Disk||Avg ms/write||<1||This is metric indicates the time, in milliseconds, spent waiting to complete a write operation on average. It is also known as write latency. As with the previous metric, Microsoft’s guidance for HDD-based systems is that write latencies of greater than 4 is poor, less than 4 is fair, less than 2 is better, and 1 or less is best – on systems with a write-cache. On I/O subsystems without a write-cache, a write latency greater than 20 is poor, less than is 20 fair, less than 12 is better, and 8 or less is best.|
|Memory||Available Mbytes||=>300||An indicator showing the amount of physical memory, in megabytes, available to run processes on the machine. You generally want a few hundred MB available to handle the random file copy or similar activity needed for system administration.|
|SQL Server: Memory Manager||Memory Grants Pending||~0||This counter shows the current number of processes waiting for a workspace memory grant within SQL Server. You want this value to be close to zero most of the time. If it is frequently greater than zero, then SQL Server needs more memory or is running T-SQL code that is wasteful of memory.|
|SQL Server: Buffer Manager||Free List Stalls/sec||< 2||This counter measures the frequency that requests for data buffer pages are suspended (a.k.a “stalled”) because there are no data cache buffers. If this metric is greater than 2, then SQL Server needs more memory or is running T-SQL code that is wasteful of memory.|
|SQL Server: Databases||Log Flush Waits/sec||~0||This counter shows, effectively, the number of times per second that SQL Server must wait for pages to be written to the transaction log. Only highly-stressed systems should have a sustained value of greater than zero.|
|SQL Server: General Statistics||User Connections||Proportionate to Batch Requests /sec||This metric shows the number of users currently connected to the SQL Server. For most workloads, it should roughly track with Batch Requests/Sec, rising and falling together. Taken with Batch Requests/Sec, it is a good indicator of
|SQL Server: Buffer Manager||Stolen Pages/sec||Proportionate to Batch Requests /sec||This metric shows the number of 8k pages that were “stolen” from the data cache to satisfy other memory needs, like plan cache and workspace memory. It’s a good indicator of the rate of data flow among SQL Server caches and should remain proportionate to Batch Requests/sec.|
|SQL Server: SQL Statistics||Batch Requests/Sec||Proportionate to workload||This shows the overall busyness of a SQL Server, that is, the number of batch requests the database engine receives per second. High-end servers running modern versions of SQL Server can sustain 10’s or even 100’s of thousands of batch requests per second on a well-tuned workload. (Note - this counter does not track all Hekaton transactions).|
|SQL Server: SQL Statistics||SQL Server: SQL Statistics||< 10% of Batch Requests/Sec||This indicator shows the number of times that T-SQL compilations occurred, per second, including recompiles. In
|SQL Server: SQL Statistics||SQL Re-Compilations/sec||< 10% of SQL
||This indicator shows the number of times, per second, that T-SQL compiled code attempted to execute but had to be recompiled before completion. This number is preferably at or near
This list is for quick reference only. For a comprehensive list of potential PerfMon object
The following wait statistics may be useful when creating a baseline or running a benchmark test. You need only collect the appropriate data from the sys.dm_os_wait_stats DMV (dynamic management view) while running your tests. You should also be certain to collect the time that the data is collected since wait statistics are cumulative and cannot give you a moment-by-moment record of their growth over time. As with collecting PerfMon objects and counters during a baseline or benchmark test, record the results to a location