7 Features to Look for in a SQL Server Monitoring Solution
Many purchases you make during your life are emotional choices—buying a house or a car are near the top of that list for most people. Yes, you often still set a budget for these purchases, but you are more likely to break it based on an emotional connection.
To see how emotions play into other purchases, such as software purchases, we need to understand a little bit about how the brain works. Don't worry, this blog post isn't a deep dive into psychology. All we need to know is that there are many areas in the brain that help process different thoughts, emotions, and actions.
The two areas of the brain we are most interested in are:
- The limbic system, which is associated with emotional processing
- The cerebral cortex, which is associated with analytical thinking
When you fall in love with a house or a particular model of car, it's usually the limbic system in your brain that takes over. This is the area of the brain that thinks it's a great idea to buy a two-seat red convertible sports car and that it will be perfectly acceptable to your partner and two young children!
Often, when data professionals are searching for a database performance monitoring tool, the first thing they think about is identifying poorly performing queries. The limbic system gives you the positive feeling telling you that you have found the right tool to address your pain point. But is that enough?
Take a second to analyze your options. Shouldn't you demand more from a monitoring solution, such as the ability to look deeper into the monitored platform to discover other problems? This is the cerebral cortex at work.
SentryOne SQL Sentry was designed and built from the ground up by DBAs. Simply put, it's a tool by DBAs for DBAs. And who knows better what a DBA needs than a group of DBAs?
In this blog post, I'll discuss 7 features you should look for in a SQL Server performance monitoring solution. We’ll separate the limbic and cerebral thought processes to allow you to make an informed buying decision.
Depending on the size of your SQL Server estate, you might think scalability isn't important. What use is being able to scale to more than 1,000 instances on a single repository if you only have 10? Let's look at it another way. If there were two cars in a showroom, and one had a top speed of 90MPH and another of 150MPH, which would you choose? Personally, I would go for the faster car (no jokes about a mid-life crisis, please) and not because it's faster per se but because it has been engineered to a higher degree and there could be times in the future in which I need that extra power and speed.
If we take the analogy further, do you think you could attain the top speed when the 90MPH car is fully laden for a family holiday? Absolutely not!
Pretend that the suitcases are three years’ worth of collected data (assuming the monitoring solution will store that much history). Which user interface do you think is designed to cope better in that scenario? One designed to cope with 150 instances or one designed to cope with 1,000?
In this analogy, we have mixed the two different areas of the brain. Your limbic system might have been seduced by the speed, while the cerebral cortex might have been calculating your relative fuel costs. However, by taking a step back and thinking about what each option offers, you will see that, in this scenario, scalability also equates to practicality.
2. Database Corruption Detection
Imagine you’re sitting in the 150MPH car admiring the dashboard. It might light up in your favorite color and display welcome messages. Your limbic system absolutely loves it, but you have a nagging feeling and your cerebral cortex takes over. Where is the check engine light and other key indicators?
In this scenario, the check engine light is your visibility into corruption in your databases. Just as you would panic if a check engine light came on, SQL Server logs when there is page corruption. A good monitoring solution will alert you to this problem and stop you from racing off at 150MPH and causing irreversible damage to the car.
Having seen hundreds of SQL Server environments over the years, it's still not an easy task to tell people that their critical production databases are corrupted. It's understandable to a degree if there is no monitoring solution in place, but there really is no excuse when a company already uses another monitoring solution, whether it’s a more generic monitoring tool or a SQL Server monitoring solution. In truth, if the tool cannot tell you that you have database corruption, it wasn't worth installing in the first place.
I remember discovering database corruption while onsite with a team of maybe 10 people and sharing it with the group on the big screen. Three people left their seats, went outside the room, and then there were raised voices. I won’t share the language that was used but, suffice to say, there were some unimpressed people and they changed their vendor of choice for monitoring their SQL Server estate.
Alerting in a car generally provides two useful mechanisms to know there is an issue. The first is a light somewhere on your dashboard and the second is an audible tone. As cars become smarter, there will no doubt be virtual assistants and smartphone apps warning you of issues, but for now, most of us are left with these two options.
There is a third alert, of course—a horrible noise, which at some point you realize is actually coming from your car. This is the equivalent of a user notifying you that there is a problem. It's not a nice position to be in!
Some solutions are best described as analysis tools rather than monitoring solutions. They might have a few alerts available to check a box in a feature matrix but not enough to be usable in the real world.
A monitoring system will not only offer template-based alerts, where you can change the alerting threshold, but also more configurable alerts, where a user can choose when they want to be alerted. After all, there might be times in which the typical alert would just be noise and not warranted.
The next step up in the monitoring solution hierarchy will give you the option to create custom alerts, with a variety of data sources to choose from and the ability to mix and match these data sources to form the perfect alert.
Receiving an alert still seems quite 20th century to me. Shouldn't solutions automatically act on these issues by now? Some solutions provide the ability to specify actions in response to your chosen alerting scenario. But imagine being able to fail your Always On Availability Group over to its secondary automatically when a memory leak has been detected—it's not beyond the realm of possibility.
The evaluation of alerting features is an area in which your cerebral cortex will take the lead. Your limbic system is likely to gloss over this capability (pausing only long enough to make note of its existence) in favor of other areas. Unless, of course, you have been burnt in this area before. Then, your limbic system might look for alerting capabilities that had been missing in other monitoring solutions. It’s the “Oh wow, I could have been doing this for ages” feeling.
4. A Variety of Metrics to Analyze
If you have owned a few cars before, you might have a brand that you go back to time after time. If it has worked for you in the past, it should work again, right? That’s your emotional side, the limbic system, taking control again.
A lot of people have the same attitude toward troubleshooting. If it worked once this way, then I'll always approach it this way. There are circumstances in which the symptoms are similar, and it makes sense to use the same approach, but there are other times when that course of action is completely inappropriate. You will find yourself making knee-jerk reactions and jumping to the wrong conclusion.
For example, imagine that you jump in a friend's car, turn on the ignition, and press the accelerator and the engine revs but nothing else happens. In exasperation, you exclaim, "Your car is broken!" With a wry smile, your friend replies, "No, this is a manual transmission, not an automatic, so you need to put it into gear first."
A developer might want to look at only the queries that are running, a DBA the wait statistics, a SAN admin the disks, and a VMware admin the hypervisor. The truth is, you will need to take all of these things and more into consideration when troubleshooting a database performance problem, and you need to be able to view performance metrics in an easy to visualize way. You shouldn't have to look at memory on one screen, IO on another, and wait statistics on yet another. This information should be visible in one place to help correlate the root cause of the problem, not to encourage you to make bad choices based on previous endeavors.
A great monitoring tool will also show you baselines, making it easy to see where values are changing so that you can identify areas that might be a cause for concern. By viewing all the high-level relevant information in one place, you will be engaging your cerebral cortex, ensuring no knee-jerk reactions from the limbic system.
5. Virtual Log Files
At a high level, Virtual Log Files (VLFs) are structures within your transaction log. VLFs help SQL Server keep track of which data can be overwritten when the transaction log is being reused and which data needs to stay to make sure data remains transactionally consistent in the event of crash recovery.
Let’s go back to our car analogy. You have been driving your car for some time now and it needs a little TLC. You decide to drain the old oil and put some new oil in the engine to make sure the engine continues to run smoothly, and nothing becomes damaged. You walk into a store to buy the oil for your car, and you see the oil you need and then notice there’s a large bottle and a much smaller bottle, which is approximately a quarter of the size. They're about the same price for the same volume, the larger being slightly cheaper overall. You wonder to yourself why anybody would want to buy the smaller option, as you would need several bottles. The smaller bottles would also take up more space in your car on the way home, take longer to open and pour the oil into the engine, produce more waste, and cost more!
You probably see where I am going with this. If you have many small VLFs, they will take up more space, reduce transactional throughput, and cost more in disk space and processing time, which is really expensive in the cloud. This problem will be massively exacerbated when also combined with High Availability technologies such as Always On Availability Groups in synchronous mode, as the excess VLFs could hamper throughput on both nodes, causing your transaction to commit even more slowly.
If your monitoring solution doesn’t provide information about the number of VLFs you have, you would struggle to know about this issue. If you were reliant on using wait statistics and wait statistics alone, then INSERT, UPDATE, and DELETE performance might show the WRITELOG wait being prevalent. The same database performing a backup would have a higher BACKUPIO wait statistic. There would be no clear path linking these two waits together. Most people would assume incorrectly that it was an IO issue rather than a database configuration issue.
You might wonder how the limbic system and cerebral cortex fit into this scenario. Again, the limbic system’s initial response will not be a positive one. It probably doesn’t even know that this is an issue because analysis tools that only deal with wait statistics won’t pick up an area concerned with the configuration settings within a database. Instead, it will be swept along with the emotional response of that big spike of waits that a query has accrued. The cerebral cortex will be more reasoned and look for correlation with other metrics to provide the right answer.
6. Support for Other Technologies
Just as the type of terrain you live in or near might affect the type of car you choose to buy, the technologies you use should affect your choice of monitoring solution.
A family hatchback might be the most practical option overall, but if its primary role is to race, then it just wouldn't be up to the task. It's the same with monitoring. There are solutions that claim to support pretty much everything, but they will not be able to provide enough detailed, actionable information to support you in your endeavors.
There are several specific layers that you as a data professional need to monitor. The following platforms have very different needs from other RDBMS platforms.
- Hypervisor—There was a time when hypervisors were not trusted; now, it is seen as almost strange not to host your database on a hypervisor. You will need insight into this layer to see if other virtual machines are causing your SQL Server issues.
- Microsoft SQL Server—This goes without saying but it's the core database engine that we all know and love.
- SQL Server Analysis Services (SSAS)—Few vendors will monitor SSAS in depth, as it is not used as much as the core SQL Server engine.
- Cloud offerings (e.g., Azure, AWS)—We are now at the point with cloud platforms that we were 15 years ago with virtualization. People are beginning to see the benefit of moving to the cloud, and we are now seeing more adoption.
You will need a monitoring solution that can cover all these platforms and provide performance information for them in the same interface with the same alerting engine to reduce your total time to value.
Assess where your environment is right now in terms of its infrastructure and converse with your colleagues to estimate where you will be in the next 3 years. Will you still be on-premises or do you think you will have moved some or all of your workload to the cloud?
It’s easy to be swept up in emotional attachments based on current or recent issues. Your limbic system does not want you to go through that pain again, and that’s a great thing. However, engaging the cerebral cortex will ensure that any systems that you implement are future proofed and prevent you from suffering pain both in terms of time and budget when you might have to change vendors again due to the original vendor not supporting all of your infrastructural requirements.
Let's revisit the car analogy one more time. Assuming you found somewhere safe to hit the car’s top speed, how frequently during your test run would you check your route ahead for potential hazards?
To make sure that everything is safe, you would always be surveying the horizon. However, you also need to regularly check on the car to make sure it's running OK. There are many potential metrics you could look at but, in this example, we will just use the speedometer to check our speed.
To assess the impact that visibility—or rather a lack of visibility—can have, let’s compare how far we would travel when checking our speed every 10 seconds versus every 5 minutes. At 150mph, that would equate to 2.5 miles per minute, which means that the following would be true:
Frequency of speed check
I'm not sure about you, but I'm quite attached to my limbs and I would like them to stay attached to me. If I was hurtling around a racetrack at 150mph, I would be checking my speed every few seconds rather than every few minutes. You should feel the same about monitoring your SQL Server estate.
In 5 minutes, I could restart a machine, perform a failover, or miss all kinds of issues that a user might (quite rightly) have a problem with. Without granular visibility, you will be visited by a horde of angry users armed with pitchforks wanting to know what is happening. Worse still, you won’t be able to tell them.
Granularity is another area that is typically left off the feature matrix when comparing solutions. It’s a cerebral cortex requirement, not one of the limbic system’s emotionally driven must-haves—unless you have experienced data gaps before.
In this blog post, we looked at 7 features and benefits that often aren’t immediately thought of when shopping for a database performance monitoring solution. The desire for them is only driven when we realize that we don’t have them but we really, really need them.
Hopefully, by splitting the decision process into what happens in your limbic system and cerebral cortex, in conjunction with the long-running car analogy to simplify the requirements, you will be able to make a more informed decision when selecting your next SQL Server monitoring solution.
Richard (@SQLRich) is a Principal Solutions Engineer at SentryOne, specializing in our SQL Server portfolio offering in EMEA. He has worked with SQL Server since version 7.0 in various developer and DBA roles and holds a number of Microsoft certifications. Richard is a keen member of the SQL Server community; previously he ran a PASS Chapter in the UK and served on the organizing committee for SQLRelay.