The ability to conduct a database performance test to establish baseline measurements and workload performance benchmarks is one of the skills strongly distinguishing senior DBAs from junior DBAs. Once a DBA is proficient in baselining and benchmarking database performance, they hate to operate without them because they’re so valuable.
First, a quick explanation of the differences between a baseline and a benchmark, plus some more bits of nomenclature.
Just like in the health example, the baseline metrics define what is normal for the database application’s performance, and, intuitively, we can only know what is abnormal about a Microsoft® SQL Server® instance by first establishing what’s normal. For a DBA, few metrics are more valuable or have a more far-reaching impact than a baseline.
The purpose of baselining and benchmarking is to let you know, definitively, what’s normal and abnormal for your SQL Server. Without this knowledge, you’re forced to rely on broad rules of thumb and hunches.
There are several reasons why few DBAs master performance baselining and benchmarking. We could just say, “It’s hard!” and leave it at that. We can also safely say DBAs are so busy coping with the most urgent issues of the day there’s no time left over to conduct a time- and resource-intensive set of tests. Both statements are true, but they don’t go far enough. Instead, let’s take a moment to more fully characterize the most challenging issues with creating baseline metrics and benchmarks.
First, the ability to build baselines and benchmarks requires a lot of knowledge about the following:
After all, if you’re trying to construct a performance baseline for a given database application and you change even one aspect of the underlying hardware, you can reasonably expect an entirely different set of values from the baseline.
Second, the DBA must also possess a thorough understanding of the following:
And if this weren’t enough, the DBA also needs to know a good deal about monitoring the SUT and how to assess the metrics captured by those monitoring tools.
The challenges in collecting a baseline can be daunting, but the benefits of having a baseline are well worth the time and energy. The baseline isn’t a valuable thing in and of itself. It only gains value when you use it as the basis for service monitoring, which means observing and assessing the health of the database service in real time.
When I first started as a DBA in the early ’90s, the extent of my service monitoring was my telephone. If it rang a lot with unhappy users at the other end, I knew service was bad. If my phone didn’t ring, I assumed everything was OK. It didn’t take long to realize how foolish this approach was. For one thing, users are never happy when they call about a problem. They’re already mad things aren’t meeting their expectations. This anger rubs off on you, the DBA, by association. You lose credibility.
Instead, why not find out what’s normal for my server and then observe server behavior proactively and quantitatively? As soon as I implemented service monitoring, I found a variety of benefits, including the following:
In this whitepaper, you’ll get a crash course in building a database performance baseline and how to collect one or more benchmarks, how to incorporate your findings into service monitoring, and how SolarWinds® SQL Sentry® takes all the labor out of the ordeal.
In this section, we’ll walk through the steps to construct your own baseline and the most important benchmarks you’ll need to proactively manage your database systems.
The first bit of advice is to always work from a project plan, complete with a timeline, deadlines, and assigned roles. The test plan should essentially be a paper representation of everything the performance baseline or benchmark will encompass. It should walk through the test in step-by-step detail and describe the following:
I can tell you from hard-won experience that many weeks of hard work were ruined because the person or team performing the benchmark failed to prepare properly. Such mistakes were caused by forgetting a cardinal rule of benchmarks: a benchmark must produce reliable and repeatable results so we can foster rational and actionable conclusions.
Keeping the “reliable and repeatable” mantra in mind necessitates a few extra steps, especially in benchmarks.
The goal of a baseline is to understand and measure server performance under normal conditions, while the goal of a benchmark is to understand and measure server performance under a specific, predefined workload. Does the application eat up all the CPU cycles made available to it? Does it need so much memory that paging occurs? Does the application code thrash the TempDB database? Are all the user requests serviced quickly? After you understand how the application acts under normal load, you’ll be able to answer these questions.
If you’ve never done any of this before, you’ll want at least three distinct measurements for a SUT:
This last measurement is often overlooked by administrative staff, who simply trust all backups, index rebuilds, and DBCC checks can finish in time and with minimal interruptions. But as I said before, if you don’t know what’s normal for these workloads, how can you tell when it’s abnormal?
In the following section, we’ll look at the steps to run a successful baseline or benchmark test. There are a couple of hallmarks of a successful database performance baseline or benchmark test:
The following steps, as shown in Figure 1, help ensure your success.
Figure 1: Five steps to run a successful baseline/benchmark test.
When gathering a baseline or running a benchmark test, it’s important to clearly define the requirements in terms of functionality, performance, and scalability. Of course, your baseline is the set of metrics for the server under a “normal” workload. However, by restating the requirements explicitly, you ensure any assumptions built into the application or administrative approach are corrected (or at least documented).
Business requirements for the baseline/benchmark provide the DBA context for the expectations of their customers, the end users.
As a first step, define the business requirements in nontechnical terms. Some examples might be as follows:
This process of defining requirements frequently helps the DBA understand the true needs of their customers (i.e., the end users) and provides important context for the technical requirements expected of their systems.
When conducting “what-if” benchmarks, you should only ever alter one component or setting at a time.
However, many benchmarks are intended to prove the feasibility or benefit of a proposed architecture. For example, the DBA may have proposed enabling the SQL Server database compression features on an important production application. How will this affect performance? Only a careful benchmark test can reveal what’s in store. However, a key to successful “what-if” benchmarking is to alter only one important component or setting at a time. If more than one thing is changed, how can you tell which one contributed to new measurements? Did one change help by +4x but the other change hurt by -2x, leading to a seemingly acceptable +2x in improvement?
When building your first baseline, you may not need any more workload than what’s already available on the production application. However, collecting a production workload and then replaying it on a test server is easier said than done, especially when factoring in normal user latencies like keying time and think time. Figure 2 offers a couple shortcuts that are often good enough, especially for “what-if” benchmarks.
Figure 2: Constructing a workload.
Of course, the best database benchmarks are those capable of accurately and reliably reflecting the applications and configuration of your own database infrastructure. But the amount of work it takes to extract your own transactional workload can be immense. The easier route is to learn and run industry standard benchmarks, such as those of the TPC, which reasonably simulate the workload of your SUT and extrapolate the test results for your environments.
My current favorite tool for performing TPC-like benchmarks is HammerDB, shown in Figure 3. HammerDB is a free, open-source load testing tool capable of running a workload based upon the TPC-C and TPC-H benchmarks on several different database platforms. It allows you to simulate heavy transaction processing and business intelligence workloads.
HammerDB is especially useful to understand the performance implication of enabling or disabling specific OS or database features, alternative hardware choices, or VM architectures. But because it’s a canned workload, you must take pains to study it so you can effectively extrapolate the possibly great difference between its workload and your own workload. Your mileage may vary, as they say.
Figure 3: HammerDB testing a TPC-C workload on Microsoft SQL Server.
HammerDB is distributed via GitHub and is usually updated several times per year by the all-volunteer development team. It’s easy to learn and can drive heavy workloads against your database servers with minimal configuration or setup. It’s possible to modify the workload HammerDB drives, so if you want to drive a heavy workload entirely of your own creation, it’s possible to do, assuming you can write SQL code. If you encounter difficulties, HammerDB features great documentation and an active discussion forum where the developers are helpful and respond quickly to questions.
If you wish to test in-house created application workloads, your best compromise between speed of execution and realism is often to enlist one or more developers to create a variation of the client application designed to generate a randomized but realistic workload. It doesn’t need to be pretty or fancy. It just needs to be a scalable and bug-free way to generate a real SQL Server workload against the database under test using the same sort of queries and transactions real users would generate.
In SQL Server 2012, Microsoft added the Distributed Replay Utility to run a multi-threaded workload against SQL Server to benchmark the impact of an upgrade from an older version of SQL Server to the most recent version. However, with the addition of the Database Experimentation Assistant (DEA), it also offers a fully scalable solution to simulate mission-critical SQL Server workloads, conduct performance testing, and undertake “what-if” tests and capacity planning.
Figure 4: DEA from Microsoft Docs.
DEA is composed of four main components, as shown in Figure 4. Though it’s a no-cost tool, it can be complex to work with and has several prerequisites. The DReplay controller and DReplay clients make up the first two components, controlling and generating the workload and pushing it to the target servers. The target servers—where the workload is replayed—and the Analysis Server—where benchmarking analysis is performed—make up the last two components. Since DEA uses trace files for its workload, you’ll also need to know how to create or capture SQL Server traces.
It seems like an obvious requirement, but the SUT should be isolated from other production, development, or test environments. Sure, it sounds easy enough. But in today’s world of consolidated disk storage and consolidated servers—not to mention virtual machines—this can be expensive and difficult.
If possible, you want your benchmarking environment to share no components with other active systems. As you can guess, this need for isolation is one of the most expensive elements of a good baseline or benchmark. (The other expense is the hours invested in the project.) Imagine, for example, you were benchmarking on a computer also serving as a user’s workstation. How could you be certain a specific performance anomaly wasn’t actually the user’s Outlook® application synchronizing with the Exchange server? This sort of shared environment can completely invalidate a benchmark.
Other elements of isolation usually necessary are as follows:
Figure 5: Isolate and document the SUT.
I’ve seen many teams attempt to analyze their benchmark results only to discover they have no idea what the Windows® registry and other settings are for the server they were testing. Did the server enable Lock Pages in Memory? Was the NIC set with Jumbo Frames? Which CPU power plan was the server using? Clearly, it’s critical to fully document the SUT. Be sure to capture all the following bits of information in a document (this is also shown in Figure 5):
You may find the Microsoft Assessment and Planning (MAP) Toolkit helpful in rapidly collecting and synthesizing a strong core of documentation about your SQL Server and Windows Server infrastructure.
If you constructed the workload properly, running the test itself is pretty easy, as shown in Figure 6.
Figure 6: Tips for a successful baseline/benchmark test run.
There are a couple of tricky parts for a novice. First, a rather long and unpleasant learning experience for me was figuring out an effective way to quickly refresh the entire test environment. In those days, we had to restore a rather large database, reboot all the servers, and reinitialize our monitoring instruments. Though you can still use this approach, there are newer and faster alternatives, such as database snapshots and—my current favorite—restoring a VM and/or VM snapshots.
Don’t be afraid to abort the test if things go awry early on. And, as mentioned earlier, only change one aspect of the SUT per test iteration.
You must determine which server you’ll use to monitor your SQL Server and the location of any monitoring log files. The best practice is to monitor the SUT remotely, so it isn’t encumbered by the overhead of collecting metrics and/or writing to the same I/O subsystem used by SQL Server.
Because of these performance concerns, the use of performance counters may need a bit of tuning. You’ll need to carefully evaluate the number of metrics and polling frequency offering the best mix of insight versus overhead. For the initial baseline, however, it’s recommended to use as many counters as desired with the highest frequency available. Subsequent tests can collect fewer metrics and/or at a lower polling frequency. Once you have some experience evaluating the resulting data, you’ll probably begin to limit the collections. In the absence of any other recommendations, collect the following throughout the baseline or benchmark test:
Don’t forget about TempDB! TempDB is a major source of resource consumption and contention on many multi-database systems. Collect information about TempDB performance as you would any other production database when you test.
Long experience at assessing and interpreting the results of baselines has taught me not to rely on large tables of data with many data points. The mind glazes over quickly. Don’t throw away the source data held in the monitoring log files. Instead, use Microsoft Excel® to load the monitoring log data into a single graphic representation you can understand. It provides enough information to interpret the results and makes it easy for you to understand the overall trend of the performance metrics.
Write a summary document to record your findings!
The summary document is the second most important product of a baseline or benchmark test. For baselines, the document should summarize things like common statistical values for important metrics, including minimum, maximum, average, median, and mode values. For benchmarks, the document should also summarize any important extrapolations and trends learned in the test. For example, in a benchmark to see how the current server performs with database compression enabled, the document might summarize that for every X amount of buffer used with compression enabled, CPU utilization increases by Y percent.
The summary document should also consider the impact of important system processes. For example, the SUT might be able to sustain a higher number of concurrent users except during the preventative maintenance cycle due to resources required for processes like backups, DBCC processes, or index maintenance. Similarly, the document should consider important business cycles, such as the end of month accounting close or an important peak usage time.
Many IT teams make the mistake of thinking about their baselines and benchmarks as self-contained artifacts. However, you’ll find the real value of both baselines and benchmarks when their findings are used to fortify monitoring and alerting for your SQL Server instances. Your baseline provides context for alerting and reacting to problems.
If you haven’t already done so, it’s urgent for you to implement an ongoing, 24/7 database monitoring and alerting system based on your findings. Find out how SolarWinds SQL Sentry can help you today!
After spending all this time describing how to collect a baseline and how to run a performance benchmark, here’s some wonderful news: if you use SQL Sentry, you don’t have to do either one. SQL Sentry comes with built-in temporal baselines visible in the upper right corner of the dashboard, as shown in Figure 7.
Figure 7: Temporal baseline drop-down menu on the SQL Sentry dashboard.
Typically, you can choose between one of three built-in baselines. It’s a smart interface, however, and the built-in baselines available from the list will change depending on the time range of the active date range (i.e., the Range Start and Range End in the upper-left corner of Figure 7). For example, if you’re viewing a three-day range of performance data in the dashboard, the Previous Day doesn't make sense to display, so we don’t provide the option. Also, the Range option is currently only available when viewing detail data (range <= 30 minutes). Otherwise, the Average option will be displayed as an overlay in the dashboard.
In Figure 8, you can see a Previous Day baseline with a Range style overlay on the SQL Sentry Dashboard. You can see patterns emerge on several charts, with spikes in CPU, wait statistics, and database I/O just after 8:00 a.m. The dashed lines and shaded area within, best illustrated in the Log Flushes graph on the middle bottom right of the dashboard, present a moving average showing the min/max of normal values for this time of day. As it turns out, the spikes in waits, CPU, etc. are well within normal behavior for this time of day. Of course, normal doesn't necessarily mean acceptable. This functionality is simply not possible with the native Microsoft tools.
Figure 8: SQL Sentry dashboard with previous day temporal baseline overlay.
When looking at longer time ranges—or when you choose it by selecting the Average radio button in the upper-right corner—the SQL Sentry dashboard will display the average value in each graph as a dashed line. As is normal for SQL Sentry, you can drill deeper into the interface or Jump To other sections of the interface, such as the Top SQL or Disk Activity tabs.
In the database industry, a benchmark is a test in which you collect the performance metrics of a system under a specific, predefined workload. SQL Sentry doesn’t have a workload generation feature built in, but what could be a better indicator of performance under load than to measure specific, predefined workloads? In other words, benchmarks in the general industry equal custom baselines in SQL Sentry.
In addition to the built-in temporal baselines, you can create your own custom baselines specifically calibrated against specific time periods and workloads. To do so, simply click and drag to select a date range on any of the charts, right-click to invoke the context menu, and then select the Create Baseline menu option, as shown in Figure 9.
Figure 9: Creating a custom baseline.
When you click the Create Baseline menu option, the Edit Baseline screen appears. This shows the Average, Min, Max, and Standard Deviation for each dashboard metric over the specified time range, as shown in Figure 10.
Here, we’ll call this custom baseline "Business Hours" to represent normal and acceptable performance during this time. You can also select the checkbox in the Default column to choose which metrics are shown by default on multi-metric charts. Additionally, you can manually adjust the baseline value by editing a specific value in the Average column. For example, you might decide the average value for Page Life Expectancy (17,423.0) isn’t representative and simply change it to your preferred value (just not 300, please). Clicking OK closes the dialog box and saves any changes.
Figure 10: Edit Baseline dialog box.
Once saved, the baseline is added to the Baseline drop-down list, as you can see in Figure 11, where the “Business Hours” custom baseline appears. When you select your new custom baseline, a horizontal line appears on each chart showing the baseline value for each metric. The line provides a reference point so you can easily spot unusual activity.
In Figure 11, you can also see at around 10:10 a.m., something started impacting disk waits, CPU, PLE, and page reads, causing them to deviate from the normal business hours level. Time to investigate!
Figure 11: Displaying a custom baseline on the SQL Sentry dashboard.
There are many other use cases like this for custom baselines. Here are a few:
You can create as many custom baselines as you want and edit them at any time by clicking the build button to the right of the drop-down menu. For multi-metric charts, you can change the default metric shown without editing the baseline by right-clicking it in the legend.
These visualization capabilities are great, but the feature wouldn't be complete without the ability to generate alerts using baseline metrics. This is where the other SQL Sentry functionality comes in. When a custom baseline is created at the Windows Server or SQL Server level, it becomes available for use in Advisory Conditions at that level.
Using baselines in an Advisory Condition is easy. Instead of setting an explicit threshold value on the right side of a comparison, select Performance Counter Baseline, and its value for the metric is automatically substituted. This way, if a baseline value ever changes, you won't need to update any conditions.
In the Advisory Condition shown in Figure 12, we've added three separate baseline tests with some basic math to control the amount of deviation from the baseline value required to trigger the alert.
Figure 12: Creating an Advisory Condition based on the Business Hours custom baseline.
The system knows to use the Business Hours baseline value for the counter on the left side of the comparison. Basically, the interface uses context-aware shorthand that will trigger the alert whenever Total SQL Server Waits are three or more times greater the baseline, when PLE is half the baseline, or when Total Server Processor Time is twice the baseline.
(Note: it’s possible to select a baseline on the left side, but you'll forego the shorthand when doing so. Since it generally reads easier with the counter on the left and threshold on the right, we recommend this approach anyway.)
No matter the size of your SQL Server estate, remember you’ll benefit strongly from baselines and the contextual information they provide about your servers.
A baseline shows you what SQL Server looks like when it’s acting normally. It provides you with an empirical context for building a comprehensive service monitoring regimen. This enables you to provide exceptional levels of customer service, uptime, and proactive management for your SQL Server infrastructure. Without a baseline, you risk many potential problems with your SQL Server instances, including the following:
If it’s normal for your server to be heavy utilized and then, suddenly, it’s lightly utilized one day, you can tell something is out of the ordinary. Perhaps the network is having trouble and the users—your customers—aren’t connecting to your server for some reason.
In contrast, a benchmark shows you SQL Server performance under a specific, predefined workload. This can help you determine the SQL Server instance’s maximum performance capabilities and the specific limitations of its capabilities, such as the maximum number of users it can concurrently support.
SQL Sentry turns the long and arduous process of creating baselines and benchmark measurements into a simple process.
Though it’s possible to incorporate any baseline or benchmark findings into the alerting of the SQL Server native tools, it requires a great deal of manual work, Transact-SQL coding, and possibly application coding. Plus, using the SQL Server native tools requires you to manually conduct a database performance baseline or benchmark test, assess the findings, and manually build out an entire set of alerts based upon those findings. Using SQL Sentry, you can automatically collect real system performance measurements, turn them into baselines, and reference them in Advisory Conditions to trigger the appropriate alerts. This helps you make sure you’re only alerted to truly significant situations important to the monitored host.
SQL Sentry doesn’t just provide built-in temporal baselines—it also provides the ability to create custom baselines (in essence, specific performance benchmarks). Additionally, SQL Sentry baseline metrics can be used as the basis for all your alerting. Though other alerting systems struggle with these problems, SQL Sentry is designed to handle them all from the ground up—quickly, directly, and easily. SQL Sentry also makes it easy to scale out alerting throughout your IT enterprise by enabling you to set baseline-driven alerts for as many or as few servers in your global IT environment as you’d like.
For more details about the advanced alerting and baselining capabilities in SQL Sentry, refer to the baseline blog posts here.
Finally, for a free 14-day trial of SQL Sentry, click here.
When configuring PerfMon for a baseline or benchmark test, consider using the following objects and counters. By default, set the PerfMon session to record the log file to a location other than the system under test. Set the polling frequency to be as low as is acceptable, keeping in mind the added overhead created by a frequent polling interval. A typical polling interval might be anywhere from every three seconds to 30 seconds (or even every minute) under some baselines or benchmark tests.
|Object||Counter||Rule of Thumb||Notes|
|Paging File||% Usage||< 70%||The amount of page file currently in use. High page file usage indicates memory shortages so severe they must be spilled to the I/O subsystem, resulting in a serious performance bottleneck.|
|Processor||% Processor Time||<= 80%||The amount of total CPU in use across all system activities. The higher this metric is, the more likely users will be delayed.|
|Processor||% Privilege Time||< 40% of % Processor Time||The amount of time spent executing kernel commands (i.e., real work) like SQL Server I/O requests. This value should seldom rise above 40% of total % Processor Time.|
|Process (sqlservr) Process (msmdsrv)||% Processor Time||< 80%||This is the amount of CPU used solely by SQL Server (sqlservr) and/or Analysis Services (msmdsrv) process threads. It’s even more important to monitor these metrics if the server is running both services simultaneously. In dedicated servers, you may monitor the single service in use on the machine.|
|System||Processor Queue Length||<2||This is the single best indicator of CPU pressure on the Windows Server. It shows how many threads are currently waiting on an answer by a CPU on the server. Less than 12 per CPU is good/fair, less than eight is better, and less than two is best.|
|System||Context Switches/Sec||<1500||This metric shows the number of execution contexts switched in the last second. High-end hardware can now handle high amounts of context switching, but in the absence of any other indicators, greater than 6,000 is a busy system, less than 6,000 is fair, less than 3,000 is good, and less than 1,500 is excellent.|
|Physical Disk||Avg ms/read||<8||This metric indicates the time, in milliseconds, spent waiting to complete a read operation on average. It’s also known as read latency. The most common physical bottleneck is usually the I/O subsystem. The Microsoft guidance for HDD-based systems says a read latency of greater than 20 is poor, less than 20 is good/fair, less than 12 is better, and less than eight is best.|
|Physical Disk||Avg ms/write||<1||This metric indicates the time, in milliseconds, spent waiting to complete a write operation on average. It’s also known as write latency. As with the previous metric, the Microsoft guidance for HDD-based systems says a write latency of greater than four is poor, less than four is fair, less than two is better, and one or less is best on systems with a write-cache. On I/O subsystems without a write-cache, a write latency greater than 20 is poor, less than 20 is fair, less than 12 is better, and eight or less is best.|
|Memory||Available MBytes||=>300||An indicator showing the amount of physical memory, in megabytes, available to run processes on the machine. You generally want a few hundred MB available to handle the random file copy or similar activity needed for system administration.|
|SQL Server: Memory Manager||Memory Grants Pending||~0||This counter shows the current number of processes waiting for a workspace memory grant within SQL Server. You want this value to be close to zero most of the time. If it’s frequently greater than zero, then SQL Server needs more memory or is running T-SQL code wasteful of memory.|
|SQL Server: Buffer Manager||Free List Stalls/Sec||< 2||This counter measures the frequency at which requests for data buffer pages are suspended (aka “stalled”) because there are no data cache buffers. If this metric is greater than two, SQL Server needs more memory or is running T-SQL code wasteful of memory.|
|SQL Server: Databases||Log Flush Waits/Sec||~0||This counter effectively shows the number of times per second SQL Server must wait for pages to be written to the transaction log. Only highly stressed systems should have a sustained value greater than zero.|
|SQL Server: General Statistics||User Connections||Proportionate to Batch Requests/Sec||This metric shows the number of users currently connected to the SQL Server. For most workloads, it should roughly track with Batch Requests/Sec, rising and falling together. Taken with Batch Requests/Sec, it’s a good indicator of the overall busyness of a SQL Server.|
|SQL Server: Buffer Manager||Stolen Pages/Sec||Proportionate to Batch Requests/Sec||This metric shows the number of 8k pages “stolen” from the data cache to satisfy other memory needs, like plan cache and workspace memory. It’s a good indicator of the rate of data flow among SQL Server caches and should remain proportionate to Batch Requests/Sec.|
|SQL Server: SQL Statistics||Batch Requests/Sec||Proportionate to Workload||This shows the overall busyness of a SQL Server (i.e., the number of batch requests the database engine receives per second). High-end servers running modern versions of SQL Server can sustain tens or even hundreds of thousands of batch requests per second on a well-tuned workload. (Note: this counter doesn’t track all Hekaton transactions).|
|SQL Server: SQL Statistics||SQL Server: SQL Statistics||< 10% of Batch Requests/Sec||This indicator shows the number of times T-SQL compilations occurred per second, including recompiles. In general, for OLTP workloads, this metric should remain proportionate to the number of Batch Requests/Sec. High values may indicate excessive ad hoc querying due to poorly coded T-SQL.|
|SQL Server: SQL Statistics||SQL Re-Compilations/Sec||< 10% of SQL Compilations/Sec||This indicator shows the number of times per second T-SQL compiled code attempted to execute but had to be recompiled before completion. This number is preferably at or near zero, since recompiles can cause deadlocks and exclusive compile locks. For most workloads, it should remain in proportion to Batch Requests/Sec and SQL Compilations/Sec.|
This list is for quick reference only. For a comprehensive list of performance metrics used by SQL Sentry, check out this SQL Sentry documentation article.
The following wait statistics may be useful when creating a baseline or running a benchmark test. You need only collect the appropriate data from the sys.dm_os_wait_stats DMV while running your tests. You should also be certain to collect the time the data is collected, since wait statistics are cumulative and can’t give you a moment-by-moment record of their growth over time. As with collecting PerfMon objects and counters during a baseline or benchmark test, record the results to a location not on the system under test. In addition, recognize the polling frequency adds overhead. Collect only what you need at a somewhat lower frequency than you might with PerfMon object counters—every one to 15 minutes, for example. Here are a few details about the top wait statistics: