Visibility Into Distributed Availability Groups With SQL Sentry
I began my career as an associate software development engineer in June of 2020, and during my short time in the industry, I’ve had the opportunity to build and troubleshoot continuous integration and continuous development (CI/CD) pipelines, work on many different technologies within several SolarWinds®, formerly SentryOne, products, and learn proper engineering practices. My first major contribution to the development of a new feature was to add distributed availability group support to AlwaysOn Management within SolarWinds SQL Sentry®. As someone who was only familiar with the business value Always On availability groups provide a few months ago, the concept of high availability data replication seemed practical enough. Availability groups guarantee important data will be replicated in several locations and will remain available to the application relying on it even if a failure occurs at one location. With the introduction of distributed availability groups, the benefits of replication and high availability are extended to the availability groups existing on different clusters.
However, as these availability group environments grow and distributed availability groups are configured to handle replication at the group level, the work involved to monitor and maintain these environments becomes increasingly difficult. Native tools, like SQL Server® Management Studio (SSMS), are limited by the amount of data available to the connected database engines. Depending on which replica in an availability group you’re connected to, this data may be incomplete and even misleading.
For example, I have a distributed availability group (DAG-01) with two availability group replicas (AG-01 and AG-02).
The two SQL Server instances below are both primary replicas in their respective availability groups.
SSMS can show us both replicas are within an availability group and a distributed availability group. However, when we look at these images side by side, we see some inconsistencies in the information provided.
- Why are both AG-01 and AG-02 listed as (Primary) in each view?
- Why does 008 show the role details of all availability group replicas within DAG-01 but 005 only shows details on AG-01?
Even more inconsistencies begin to appear when you open the Always On view of the secondary replicas within the same two availability groups.
Not only do we see the same misleading tag of (Secondary) on both AG-01 and AG-02, now we no longer see any reference to our distributed availability group.
The lack of consistent information across all replicas can make monitoring and maintaining distributed availability groups through SSMS time-consuming and difficult. This difficulty is the driving force behind the latest addition to the SQL Sentry AlwaysOn Management view: distributed availability group support.
For this example, I’m using the same distributed availability group environment I described above.
This view may be familiar to you if you’ve ever used the AlwaysOn Management view in SQL Sentry. In this view, every row represents an availability group, and each column represents a replica within it. As a part of distributed availability group support, we’ve added a new “Distributed AG” column.
If an availability group is associated with a distributed availability group, a node will appear in this column representing the linked availability group. In this case, we see AG-01 is linked to AG-02 within the DAG-01 distributed availability group. The other information in this node includes the linked availability group’s health and primary replica.
In this layout, an animated “pipe” of data flow runs between the distributed availability group node and the primary replica of AG-01. This pipe represents the amount of data flowing between the two availability groups and will flow in the direction of data flow (Distributed Node => 005, the forwarder, in this case).
If we switch the WSFC context to look at AG-02, the distributed node and connection link reflect the relationship between the two availability groups in the opposite way.
The text has changed within the node to reflect the new linked availability group and primary replica, and the connection link now flows from 008 to the distributed node to represent the correct direction of data flow.
In an extension of existing functionality, the animated connection links will continue to work between the distributed nodes and their associated replicas.
An Extension of the Connection Links
In addition to the direction of the flow animation, the thickness of the green bars will indicate how much data is flowing compared to the max amount of data recorded. In this example, the green bars almost fill the pipe because the KB to Replica/sec metric is currently at 94% of its maximum value.
Roan Urquhart is an Associate Software Development Engineer at SolarWinds. With a recent focus on build and release pipelines, Roan works to streamline CI/CD processes to ensure they’re an asset, not an efficiency burden.