SQL Sentry v10 : Index Analysis
When Jason Hall (@SQLSaurus) announced v10 earlier this week, I will admit, I had some goosebumps. I knew that a new module that we had been working on for many months was finally going to see some daylight. Nothing is more exciting than once again blowing the doors off the native tools and providing you with another feature that helps you tune queries that much faster.
As SQL Server people first and foremost, we saw the pains often encountered by customers or colleagues troubleshooting queries and analyzing execution plans. We knew that many tuning efforts ultimately lead to more questions than answers; questions like:
- Was the chosen index the most optimal for this operation?
- How close was the chosen index to covering the query?
- What other indexes exist on this table?
- Would a missing or recommended index have fared better?
- Were estimates off because stats were out of date, or for a more elusive reason?
- Would different parameter values have yielded a different plan?
- What do the statistics histograms look like for the relevant columns?
- Do the parameter values help to indicate data skew or ascending key problems?
Our new Index Analysis tab can help answer these questions with much less manual legwork, enabling you to focus on your query's performance instead of wasting valuable time gathering facts and metadata.
Before getting into any details, I should tell you how you can access this new feature, in case you want to follow along. Index Analysis has been made available first in SQL Sentry v10, but it will later be the flagship feature in our next version of Plan Explorer. If you haven't upgraded to v10 yet, you should check out Jason's post for a great overview, and even just browse the change list. Not a SQL Sentry customer (yet)? Take it for a spin today!
Currently, you need to generate a new actual or estimated plan in order to access the Index Analysis tab. Due to performance and storage overhead, it just wasn't feasible to capture all of this data with every single plan we capture, but we are investigating ways to collect this valuable information for you in the background, too. (In the new unified Plan Explorer, there also won't be any Index Analysis data if you open a plan generated in SSMS or a previous version of Plan Explorer, since that data wasn't collected at runtime.)
You can start a new integrated Plan Explorer session in the SQL Sentry Client by going to File > New Plan Explorer Session.
This is just a quick example of the output in the Index Analysis tab. Using a pretty standard installation of the AdventureWorks sample database, I created the following stored procedure:
CREATE PROCEDURE dbo.IndexAnalysisDemo @MinProductID int, @MaxProductID int, @MinSalesOrderID int, @MaxSalesOrderID int AS BEGIN SET NOCOUNT ON; SELECT ProductID, SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, UnitPrice FROM Sales.SalesOrderDetail WHERE ProductID BETWEEN @MinProductID AND @MaxProductID AND SalesOrderID BETWEEN @MinSalesOrderID AND @MaxSalesOrderID; END GO
I experimented with calls to the procedure, choosing a variety of different values for each of the parameters. On several of the iterations, the following graphical plan was produced:
This looks like a pretty efficient plan; after all, it's a seek, right? Well, we can see by the red text for the row counts that our estimates are way off. And when the underlying SQL Server version supports it, our graphical plan now shows a handy warning for things like residual I/O - if you hover over to see the tooltip, you'll find this toward the bottom:
We'll treat this in more detail in a future post. For now, I started to wonder if my seek was such a good thing after all. Our new Index Analysis feature couldn't have come at a better time! I moved to the Index Analysis tab, and I saw this:
This is just telling me that I need to select a single index operation (Seek or Scan), either in the above drop-down or in another tab, in order to show relevant information in the Index Analysis tab (there is just way too much information to display multiple operations together). Like all of the grids in the integrated and standalone Plan Explorer product, this tab is also context sensitive, so its display depends on what might be selected on other tabs. I selected the clustered index seek, which happened to be the only relevant operation in this relatively simple plan, and the tab lit up (click to enlarge):
It's a lot to take in on first glance, so let me first highlight the different functional areas on this tab, then I'll dig into each one (again, click to enlarge):
Now I'll explain these four areas I've highlighted:
What I labeled as the Columns Grid and Indexes Grid are actually all part of the same grid control, but I thought I would explain them separately. The columns side of the grid shows all of the columns on the left side, with bold text for the columns used in the query in some way - as output columns, join criteria, or filters. Currently, this shows all columns involved with the query or at least one index, but in the future, we will show all columns in the table. When column-level statistics are available, we'll show details like density, the last update, and average length. (When not available, you'll see
? in their place.) We'll also compute the estimated size for you, and indicate whether the column is used for output. The next two data points, if the columns are involved in a sort of any kind, show sort direction (
DESC) and ordinal position of the column within the sort. For example, if we ran with a bad idea and added the following to the above query:
ORDER BY SalesOrderID DESC, SalesOrderDetailID;
We would see data for those first two rows in the sort columns as follows:
The last column in this grid shows the predicate(s) used against each column, if any; this can include both join and filter conditions, and if more than one predicate exists, they will be comma-separated.
Along the top you can see that there is a display column for each index (my screen shot doesn't show the column names; they're at the left of the entire grid). If there is a clustered index, it will be listed first, on the left.
Following that will be the index that was actually used for the currently selected operation (if it wasn't the clustered index). In either case, the selected index will have its name in bold.
Next, we'll show any missing indexes if they were suggested; in some cases, we will recommend an index, but we'll always defer to SQL Server if it provides a missing index suggestion (that wasn't the case here). Sometimes there won't be a missing index suggestion from SQL Server *or* a recommended index from us.
After that, we'll list all the other indexes on the table, ordered by score: highest on the left, lowest on the right.
Wait, what is this score you speak of?
I thought I would introduce that quietly to make sure you were still paying attention, even though this is really one of the defining concepts of this feature. :-) With a tremendous amount of input from Paul White (@SQL_Kiwi), we have developed an algorithm to score each index, based on a number of factors - density/selectivity, whether they cover the query, whether they support a seek, and a variety of others. The color behind the score is scaled in an intuitive way, even if you're not already familiar with our software: green is good, yellow is okay, red is bad. You may note that the only decent score here has been assigned to an index that doesn't even exist.
It is also important to note that a score of 100% is not always attainable, and that the scoring algorithm can change over time.
Beneath the score there are rows for each column in the table; therefore there is a cell for each column in each index. We attempt to relay a lot of information visually here, partly through color, and partly through data. The colors have the following meanings:
|Grey||The column is either part of the clustering key (so it is automatically included in all non-clustered indexes, and will be indicated as such with the word Included), or the column is part of the clustered index (in which case, under the clustered index, it will be indicated as such using the word Data).|
||Green||The column is used in the query, the current index key covers the column, the ordinal position is favorable, and a seek is likely. The darker green usually means it's the leading column in the index and it has a predicate applied.|
|Yellow||The column is used in the query and is either an
|Red||The column is required by the query, either as an output column, a filter, or part of an expression, but is not covered by either the key or the
|White||The column is not required by the query.|
Text in the cell could be a number, which indicates ordinal position within the index key. If the cell says "Included" then it is part of the
INCLUDE list. If the cell is blank, then the column isn't part of the index.
Note that you can change that. In the indexes grid you can make hypothetical or real changes to existing indexes, test the feasibility of new indexes, or even update stats directly without ever leaving this tab.
Improve the score of existing indexes
In any cell you can change the ordinal position of a column, move it to or from the
INCLUDE list, or remove it from the index altogether (by choosing the blank option in the drop-down). Don't worry, these changes don't go in and muck with your indexes as you click around in real time; in fact that's not really the purpose.
You can experiment with the estimated impact of changing existing indexes right here in the grid, to see if minor tweaks to those indexes could improve their score. As you make beneficial changes to an index, you will see its score at the top of the column change. You can then consider making actual changes to the database by scripting out a drop and re-create of an index with your changes included. For example, let's say I wanted to "improve" the
IX_SalesOrderDetail_ProductID index (currently scored at 80%) by including the three other columns required by the query. This improves the index score to 100%, as you can see to the right (from a separate screen shot of course):
Create a new index with a better score
The last index is entirely blank except for a
+ button at the bottom; this column is here so you can create your own index. In this case, there is not much point - it's not possible to do better than the index SQL Sentry has already recommended. But still, for the sake of experimentation, let's hit the
+ button and create a new index that mimics the recommended index to some degree, with a different key column. Here it is easy to obtain that 100% score as well, and I can hit the
<s> button at the bottom of the column in order to generate a script for the new index:
Once I've scripted the index, I can make changes to it, and I can either copy it from this dialog to run elsewhere, or I can immediately execute it. Note that I can use the same
<s> button at the bottom of any index in the grid to inspect or directly change the index after I've made any changes.
Update statistics directly
There is also an Update Statistics button, which looks like this: (). Under any index I can choose to manually update statistics if I notice that the last stats update was a long time ago, or if I know about recent underlying data change that might not get picked up by auto-update statistics. Depending on the options you choose, you may get a warning about performance impact:
At the bottom of the grid, above the buttons, are two other pieces of information: when the index statistics were last updated, and the estimated size of the index.
Between the ability to update statistics and the immediate feedback of the index score, this can be a very powerful way to gauge the potential impact of new or modified indexes. This can effectively serve as a sandboxing environment where you can consider the effect of these changes without actually having to create new or hypothetical indexes (or touch the underlying instance of SQL Server at all).
The exercise can also help you identify the most expensive columns in your index (which prevent you from getting a better score), and consider removing those from the query; or the least selective columns in your index (which can guide you in moving them away from the key).
If you're already familiar with Plan Explorer, this grid will look a little familiar, as it will show the compiled and runtime values of every parameter. But there is some extra functionality here as well, including the total number of rows predicted, and the estimated operation that will take place with those values.
In some cases, the compiled parameters may expect to yield a seek, while the runtime parameters may expect to yield a scan, or vice versa, due to factors such as "the tipping point." Note that these are educated guesses, not guarantees of what will happen when a plan gets generated - in fact when the values fall within the grey area of the tipping point, we'll place a
? there instead.
There is a third column called "Test Values" which, you may have guessed, allows you to test completely different parameter values, and generate a new estimated execution plan (this will refresh the entire session with the new, estimated plan). If you only want to change a single parameter value, you can populate the rest with either the compiled or runtime value by clicking the small arrow next to each value:
Finally, the Histogram allows you to visualize data skew and easily identify potential problems with the data distribution for the leading key column of the selected index. The above shows the distribution of values for the
ProductID column, and you can see how the runtime parameter values are shown through a range overlay.
Handy tooltips show you all the information you're used to parsing endlessly from the output of various
DBCC commands and DMVs. You will be able to easily see when a
NULL or other token value represents a large portion of your data set, and know at a glance which parameter values you should be testing for the best chance at capturing parameter sniffing, ascending key, or other plan variation problems.
This is a value-packed feature, and in upcoming posts, I plan to dig deeper and give you more detailed and practical examples to work from. And as a reminder, all of this functionality will be available in a future version of Plan Explorer.
Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and serves as a community moderator for the Database Administrators Stack Exchange. Aaron's blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.