SentryOne Performance Analysis Dashboard Tips and Tricks
The SentryOne Performance Analysis Dashboard is understandably one of the most-used views within SentryOne. The ability to view key metrics in real-time or historically and then easily drill down into what could be causing observed resource utilization makes the dashboard an obvious starting point within SentryOne. However, in the several years I have spent training SentryOne users, I find there are still dashboard features that are often overlooked. Let's dive into how to best utilize these dashboard features.
Table of Contents
On the CPU chart shown below, SQL Server is broken out into its own section (displayed in blue), while other processes are grouped into the Other category (displayed in green). This differentiation makes it very easy to see if a CPU spike is related to a SQL Server process or external process. When you see a spike in non-SQL Server processes, you can check the Processes tab to see which process outside of SQL Server might be the culprit. If you find your servers regularly or intermittently having CPU utilization spikes from a process outside of SQL Server, it can be beneficial to break out the process into its own process category.
Example of a Custom Process Group on the Dashboard
As you might be able to relate to, I often end up with dozens of Internet browser tabs open, which can use a good amount of resources. So, for demo purposes, I have broken out Chrome into its own process category. Doing so separates it into its own section on both the CPU and Memory charts (displayed in orange), as shown in the screenshot above. Breaking out certain processes can help you quickly identify if a well-known process starts to heavily utilize resources. Details on how to create new process categories can be found here.
If you find that most of your waits are showing up as the Other category, or you have specific waits that are most relevant to you, it can be very helpful to break waits out into custom categories. Doing so enables you to quickly identify what specific wait activity is tied to.
Below you can see a very common view of the Waits chart, with a decent amount of the waits activity being grouped into the Other category (displayed in gray).
Waits Chart Before Creating Custom Wait Category
After breaking out CXPACKET and BACKUPIO waits into their own categories, you can see in the screenshot below that it's much easier to quickly visually identify the breakdown of that wait activity.
Waits Chart After Creating Custom Wait Categories
I have also seen SentryOne users break out HADR waits if Availability Group (AGs) are involved, or specific latch waits when dealing with blocking issues. You can break out the waits that are meaningful to you. For information about how to create custom wait categories, check out Justin Randall's blog post "Dashboard Custom Wait Categories."
SentryOne provides several out of the box baselines (e.g., Previous Period, Day, Week, Month, Quarter, Year). Baselines can provide a quick glance at the available previous time ranges to assist in change analysis, trending, and pattern recognition.
A common use case is if you see a sudden spike or drop in a particular metric. Take Page Life Expectancy (PLE), for example. As you can see in the screenshot below, any time I see a major drop in PLE, I throw up the Previous Day and/or Week baselines to quickly see if there is a pattern to this behavior, which might be tied to a schedule process. The drop in PLE could certainly happen because of maintenance processes or could related to a heavy process that runs every day/week, causing objects in buffer to get flushed back to disk. If this behavior is causing issues for other critical processes at that time, you could reschedule that heavy process to a less impactful time.
Previous Day Baseline Showing that PLE Drop Also Occurred at the Same Time the Previous Day
Baselines can also be helpful for more macro situations such as comparing months/quarters before and after a major change (e.g., upgrading SQL Server). Note the larger resolution baselines will only become available when viewing larger ranges of time. See the SentryOne documentation for more details on performance analysis baselining.
You can also create custom baselines. Your custom baselines could be for something as small as an hour of heavy processing, before and after a change, to see if you've achieved the increased performance you were hoping for. Or, you could use them for something more macro, such as a lift and shift to the cloud. In that situation, you would want to create several baselines before and after during heavy load time ranges and/or entire days, weeks, etc.
To create a custom baseline for a specific time range, simply pull up the time range you want to baseline on the dashboard, highlight the time range, and click Create Baseline.
Creating a Custom Baseline for the Highlighted Time Range
Doing so will auto-calculate Min, Max, Avg, and Standard Deviation for all dashboard metrics in that time period. (Note that if you're interested in only calculating these values over a specific time period, you can click Create Baseline, note the values you care about, and simply close out the window without saving the custom baseline.)
If you want to perform change analysis, you will find all saved baselines in the Baseline drop-down menu. Selecting the ellipsis icon to the right of the drop-down menu (show in the screenshot below) when a custom baseline is selected will bring up the raw data for that baseline, as well as the option to delete them.
The Baseline Drop-Down Menu
For any selected baseline, only a single counter will be referenced/displayed on each chart at a time. You can change the scope of the counter data you see for the baseline by right clicking the counter legened (as shown below for Read versus Write latency).
Swapping the Scope of the Baseline Between Read and Write Latency
Something to be aware of is the option to make a baseline global. When not set to global, a baseline can only be overlaid where it was created. But in cases such as AGs or migrations, it could be beneficial to overlay a baseline created on one target onto another target's Performance Analysis Dashboard. Creating a global baseline will also make it available to Advisory Conditions for alerting or automated response purposes. See Lori Edward's blog post "Alerting, Baselines, and Standard Deviations" for more details.
Sample Mode provides more granularity related to the data being collected on the dashboard, displaying a detailed snapshot of the data SentryOne uses to plot on the historical charts. With Auto-Refresh enabled, you will see the Sample Mode update every 15 seconds or so. Sample Mode can be helpful when dealing with real-time issues, but what about getting granular details historically? You can view the Sample Mode for that historical data point by clicking a single point in time on the History view to select it, right clicking it, and selecting Sample Mode.
Selecting the Peak of a Spike and Jumping to the Sample Mode
One of my favorite use cases for viewing historical data samples is for digging into waits, as Sample Mode provides a more detailed view that is easier to explore if you have a lot of activity. Note the scroll bar at the bottom (shown in the screenshot below).
Waits Chart in Sample Mode
I have also found the Sample Mode view to be helpful when digging into parallelism or affinity settings. The ability to see individual core and socket utilization can provide insight into whether you need to adjust these settings.
CPU Chart in Sample Mode Showing Individual Core and Socket Utilization
Sample Mode provides additional counters such as User Connections, Memory Grants Pending, and Context Switches. Sample Mode allows you to see point-in-time values for these extra metrics. But what about historical graphs for these other metrics? Although these are not available on the dashboard, you can generate reports for these metrics. Simply right click the metric and select Quick Report.
Running a Quick Report for Sample Mode-Only Counters
The report will give you a historical view for that counter. It will typically default to a 30-minute time range surrounding the timestamp you were viewing in the Sample Mode. You can see Min, Max, and Avg over the time range calculated and can drill into the raw data to find timestamps/values associated with a major drop or spike. Selecting Refresh Report in the top left corner of the report will allow you to specify the time range you desire.
Generated Quick Report
Although this post is centered around the dashboard, I can't share the previously mentioned Quick Report option without taking it one step further. There are two very helpful reports that I find often get overlooked—the Performance Counter History and Performance Counter Date Range Comparison reports. SentryOne collects many more counters than just what you see on the dashboard, or even in the UI. You can enable the collection of even more counters than what SentryOne collects by default. See Justin Randall's "Enabling Additional Performance Counters" blog post for more details on how to do so.
The two reports I mentioned allow you to create custom reports with counters of your choosing and provide visibility into the wealth of additional counters SentryOne offers, including specific waits. The Performance Counter Date Range Comparison report takes it one step further, allowing you to see two different time ranges and the percent change between those time ranges. This report is great for change analysis or comparing month-end processing. You can get to these reports by going to Reports, clicking Performance Analysis, and then selecting Performance. I will go into more detail on these reports and others in an upcoming post.
I'm always amazed at how often the virtual layer is a black box to DBAs. It's like a car mechanic who is tasked with maintaining or fixing a car but can't open the hood to look at the engine and surrounding components. But nonetheless, this is often the case. If you're using VMware, you can add a vCenter. Doing so will provide a topology of the vCenter, automatically overlay additional counters on your SentryOne dashboard (Co-Stop %, Ready Time %, and Ballooned Memory), and provides additional conditions. One of the most critical conditions provided is VM Moved, which notifies you if a VM has vMotioned to another host.
Co-Stop, Ready Time, and Ballooned Memory Counters Added to Dashboard
The additional counters provide visibility into whether the performance issues you're encountering are due to host-level contention and/or a noisy neighbor situation, saving you time and providing evidence you can take to your infrastructure team. I have also heard a lot of horror stories caused by vMotion, so being aware of when a vMotion occurs can be very helpful. More detailed information about enabling vCenter can be found in Jason Hall's blog post.
Hopefully, this walk through of how to leverage Sample Mode, reports, baselines, and virtualization monitoring in SentryOne will make it easier to administrate and monitor your environments. As always, if you have any questions, please feel free to reach out to the SentryOne Support team or your Customer Success Manager.
If you haven't given SentryOne a try yet to explore the Performance Analysis Dashboard further, you can download a 14-day trial here.
Patrick is a Customer Success Engineering Manager and helps to provide customers with support services, troubleshooting, and defect resolution. Patrick also does some side work with SentryOne’s Professional Services team, who provide training and consulting. Patrick’s blog focus is on helping SQL Sentry users get the most out of our products. His hope is to ensure our users are as knowledgeable and comfortable as possible with SQL Sentry products, so they have as much ammo as possible to solve real world SQL Server problems.