Deadlock Visualizations in SQL Sentry and Plan Explorer

AaronBertrand over 5 years ago 5 minute read time

In SQL Sentry Plan Explorer 3.0 Demo Kit , I touched briefly on a simple deadlock scenario, where we have two stored procedures that take the same locks, on the same tables, but in the opposite order. This is a classic deadlock scenario: one procedure has taken a lock on A and is waiting on the other procedure before it can access B, but that procedure first took a lock on B and is now waiting on the first procedure to release its lock on A. This is like Liza instructing Henry to fix the hole in the bucket using water he can only retrieve with the holey bucket.

Management Studio presents the information about this deadlock as follows :

SSMS deadlock diagram (click to enlarge)

Our software has, for ages, presented this deadlock in the following way :

Previous SentryOne / Plan Explorer deadlock diagram (click to enlarge)

Our goal was to provide a lot more information -- without having to parse through XML for key pieces of information, memorize cryptic numbers like a HobtID, or reverse engineer transaction descriptors. We also wanted to make it obvious which locks were taken in which order, even though in some examples (like this one), the order isn't crucial.

In SentryOne 18.3 and Plan Explorer 18.4, we've added several enhancements to make this visualization even more powerful. First, the updated diagram is much simpler and easier to read:

Updated SentryOne / Plan Explorer deadlock diagram (click to enlarge)

We've drawn the boxes in a more horizontal format, so that the details are much easier to read, and laid out the deadlock diagram itself in a much more intuitive pattern (circular, by default). This is already far less complicated to piece together; but wait, there's more!

Play it again, Sam!

Not only have we optimized the layout and made the nodes easier to understand, we've also added the ability to replay the deadlock, so you can step through and see each lock as it is taken. Left out of the capture above is the layout control at the bottom of the diagram:

d1-layout-1

With play/pause being the obvious one, the other replay controls are labeled here:

dl-replayControls-1

If you press the play button here, you'll see the diagram automatically cycle through these steps, simulating watching the deadlock happen in real time (use the controls here to step through the deadlock sequence; click any image to enlarge):

Step 1: SPID 64 takes an X lock on Invoices.

Step 2: SPID 63 takes an X lock on InvoiceLines.

Step 3: SPID 64 tries to take an X lock on InvoiceLines, but it's still locked by 63.

Step 4: SPID 63 tries to take an X lock on Invoices, but it has been locked by 64.

Step 5: This is unresolvable, so SQL Server chooses SPID 63 (arbitrarily, in this case) as the victim.

I made it very manual here, but you'll have to trust me that it is much smoother in the application than web-based JavaScript allows.

That's a neat trick, but what else does it do?

Replay can be a very helpful way to reverse engineer a deadlock, but we're not done yet. The other parts of the layout control include a zoom slider and auto-fit checkbox, which are fairly self-describing. The zoom slider zooms in and out, and the auto-fit checkbox tries to keep the entire diagram within the frame of the panel:

ab-dl-zoom-1

Then there is the Layout Type selector and a checkbox called Optimize Layout:

Deadlock layout controls (click to enlarge)

Let's pull up a more complicated deadlock (DeadlockOptimization.zip) to help illustrate what these options do. By default, the circular representation of this deadlock looks like this:

dl-circ-standard-1

Circular layout : optimized (click to enlarge)

There's a lot going on there, but much of it is noise. There is a whole bunch of contention on the table SqlPerf.Session -- session 342 is trying to perform an update, but it is stuck waiting on shared locks taken by two services. After choosing Optimize Layout (yielding the second graph), it is greatly simplified.

This checkbox is easily the most powerful option to discard noise and help you focus on the crux of the deadlock issue. In the original graph, you can see that many of the elements presented are simply innocent bystanders -- waiters that are captured as part of the deadlock activity, but in no way contributing to it. We can detect this in a lot of cases and so, when you check the box, we hide them from view, allowing you to focus much more directly on the key players involved in the deadlock. There is no question that eliminating the noise can really speed up troubleshooting; with those extra nodes removed, I can clearly see that I have some kind of order-of-operations issue on the SqlPerf.Session table, between the transfer service and the processor service.

The other two layout options are force directed and layered digraph. Force directed attempts to rearrange the diagram to "balance out" the representation. More detail is available here. Layered digraph | attempts to arrange the nodes in a graph horizontally, so they can be read left-to-right, top-to-bottom. Again, more detail here.

In the end, these options were added to give alternative graphical layouts to deadlock diagrams, because different people can interpret a lot from the relationships between nodes based solely on the graphical layout. You might prefer one of those views over another; in future posts, I'll take a look at some more complex deadlocks and see if there are use cases where the force directed or layered digraph options might make your work easier.

Conclusion

We are trying to make deadlock analysis more intuitive and hope these alternative and optimized graphical representations will help your troubleshooting. Try them out! Let me know what you think! And hey, maybe you like investigating deadlocks you've already manually captured. Wouldn't it be great to have a tool that automatically collects all of your deadlocks for you, and brings them to your attention based on your preferences?