New Deadlock Visualizations in SQL Sentry and Plan Explorer
Published On: August 15, 2018
Categories: Deadlocks, SQL Sentry, Plan Explorer 2
In the Plan Explorer 3.0 Demo Kit, I touched briefly on a simple deadlock scenario, where we have two stored procedures that take the same locks, on the same tables, but in the opposite order. This is a classic deadlock scenario: one procedure has taken a lock on A and is waiting on the other procedure before it can access B, but that procedure first took a lock on B and is now waiting on the first procedure to release its lock on A. This is like Liza instructing Henry to fix the hole in the bucket using water he can only retrieve with the holey bucket.
Management Studio presents the information about this deadlock as follows (click to enlarge):
Our software has, for ages, presented this deadlock in the following way (click to enlarge):
Our goal was to provide a lot more information -- without having to parse through XML for key pieces of information, memorize cryptic numbers like a HobtID, or reverse engineer transaction descriptors. We also wanted to make it obvious which locks were taken in which order, even though in some examples (like this one), the order isn't crucial.
In SentryOne 18.3 and Plan Explorer 18.4, we've added several enhancements to make this visualization even more powerful. First, the updated diagram is much simpler and easier to read (click to enlarge):
We've drawn the boxes in a more horizontal format, so that the details are much easier to read, and laid out the deadlock diagram itself in a much more intuitive pattern (circular, by default). This is already far less complicated to piece together; but wait, there's more!
Play it again, Sam!
Not only have we optimized the layout and made the nodes easier to understand, we've also added the ability to replay the deadlock, so you can step through and see each lock as it is taken. Left out of the capture above is the new layout control at the bottom of the diagram (click to enlarge):
With play/pause being the obvious one, the other replay controls are labeled here:
If you press the play button here, you'll see the diagram automatically cycle through these steps, simulating watching the deadlock happen in real time (use the controls here to step through the deadlock sequence; click any image to enlarge):
That's a neat trick, but what else does it do?
Replay can be a very helpful way to reverse engineer a deadlock, but we're not done yet. The other parts of the layout control include a zoom slider and auto-fit checkbox, which are fairly self-describing. The zoom slider zooms in and out, and the auto-fit checkbox tries to keep the entire diagram within the frame of the panel:
Then there is the Layout Type selector and a checkbox called Optimize Layout:
Let's pull up a more complicated deadlock (DeadlockOptimization.zip) to help illustrate what these options do. By default, the circular representation of this deadlock looks like this:
There's a lot going on there, but much of it is noise. There is a whole bunch of contention on the table
SqlPerf.Session -- session 342 is trying to perform an update, but it is stuck waiting on shared locks taken by two services. Now, let's check the Optimize Layout box above, and look at the circular graph again. Simplified, right?After choosing Optimize Layout (yielding the second graph), it is greatly simplified.
This checkbox is easily the most powerful option to discard noise and help you focus on the crux of the deadlock issue. In the original graph, you can see that many of the elements presented are simply innocent bystanders -- waiters that are captured as part of the deadlock activity, but in no way contributing to it. We can detect this in a lot of cases and so, when you check the box, we hide them from view, allowing you to focus much more directly on the key players involved in the deadlock. There is no question that eliminating the noise can really speed up troubleshooting; with those extra nodes removed, I can clearly see that I have some kind of order-of-operations issue on the
SqlPerf.Session table, between the transfer service and the processor service.
The other two layout options are force directed and layered digraph. Force directed (standard | optimized) attempts to rearrange the diagram to "balance out" the representation. More detail is available here. Layered digraph (standard | optimized) attempts to arrange the nodes in a graph horizontally, so they can be read left-to-right, top-to-bottom. Again, more detail here.
In the end, these options were added to give alternative graphical layouts to deadlock diagrams, because different people can interpret a lot from the relationships between nodes based solely on the graphical layout. You might prefer one of those views over another; in future posts, I'll take a look at some more complex deadlocks and see if there are use cases where the force directed or layered digraph options might make your work easier.
We are trying to make deadlock analysis more intuitive, and hope these alternative and optimized graphical representations will help your troubleshooting. Try them out! Let me know what you think! And hey, maybe you like investigating deadlocks you've already manually captured. Wouldn't it be great to have a tool that automatically collects all of your deadlocks for you, and brings them to your attention based on your preferences? Give SentryOne a try!
Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com. Aaron's blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.