Top 10 SQL Server Mistakes Made by DBAs
SQL Server DBAs—even those who have years of experience with relational databases—frequently make the same mistakes repeatedly. Wouldn’t you like to learn about those blunders so that you can sidestep them?
Well, I have good news!
In a recent webinar, I counted down the 10 most common SQL Server mistakes DBAs make. Read below as I recap the full list and share helpful resources for avoiding these common missteps.
Top 10 SQL Server Mistakes
10. Storage and I/O—DBAs frequently think about disk subsystems only in terms of disk space, not I/O load. Without this knowledge, the following problems occur:
- Inadequate fault tolerance
- Insufficient I/O
- Poor choice of RAID type, controllers, and channels
- Not enough disk spindles
If you’ve ever been caught off guard by insufficient data storage space, check out this on-demand webinar on How to Predict Your Future Storage Needs.
9. Business ignorance—As the liaison between business and IT, you should know how—and in what ways—your servers are used to support organizational business needs. You should know:
- Who is affected if this app is down?
- How much does the downtime cost the company?
- What are the business cycles?
- When are the best downtimes?
- Baseline? Benchmarks? What is normal?
8. No troubleshooting methodology—When the chips are down, DBAs need a strong, step-by-step methodology for root-cause analysis. Without one, you get:
- Missed errors and problems
- Errors resulting in data loss and catastrophic failure
- Poor response times and breached SLAs
- Lost credibility
The following resources can help you overcome your troubleshooting challenges:
- Blog: End-to-End Troubleshooting Checklist; Slides, Scripts, Video, and Articles
- eBook: Troubleshooting SQL Server Performance
7. Default configuration settings—SQL Server installation defaults are intended to get the server up and running, but not running optimally. It’s important to know what they are and why you may or may not want to change them. Check out this SentryOne blog, which covers three SQL Server default values that should be changed.
6. Security as an afterthought—Most DBAs consider security to be a problem that other people in the IT organization handle, perhaps the Active Directory administrators or maybe even the Dev team. Yet, SQL Injection is the #1 hack on the internet today that causes monetary damages. While there are lots of things you can and should do to secure your SQL Servers, you should at least take the time to determine who is managing security for important corporate workloads. Sadly, most DBAs don’t even ask questions about who is managing security. So, what’s the game plan for minimizing the issue?
- Ensure the least privileges principle for applications running on your servers.
- How much surface area do your servers expose?
- Who has access to your servers?
- How do you find out the who, what, and when of a breach?
5. Inadequate automation—Automation is the means by which DBAs work “smarter,” not “harder.” Ironically, it takes a lot of work at the outset to implement automation. A few examples of automation that can help DBAs include:
- Automating error notifications
- Scheduling jobs
- Lots of scrips, not too much GUI
PowerShell is a great way to tackle automation and scalability. Use the https://dbatools.io library for tips and best practices for getting started with PowerShell.
4. Wrong feature or technique for the job—DBAs are the “performance and reliability engineers” for most corporate IT applications. It’s imperative that DBAs apply the most appropriate feature to each business requirement. Yet many IT professionals spend little time thinking about architecture and feature implementation that is the lowest risk. This often translates into IT professionals who enjoy the “shiny” new features that come out with each new release.
- However, these features are often only partially implemented-at least that’s a common occurrence on v1 releases of features on SQL Server. That means they may have more support issues, require more kluges and workarounds, and otherwise introduce risk into your newest project. When you use the wrong feature or technique for the job, you’ll have:
- Brittle applications
- Complex applications
- Excess resource consumption
- Design that reflects the current “fad”
Keep in mind there are no IT projects—there are business projects that are solved using IT.
3. Apathy about development, DevOps, and change management—Many DBAs consider all T-SQL code as the realm of the development team. As the owner of database performance, that can be a self-defeating attitude. DBAs have to stay on top of the code that’s running on their SQL Servers. In fact, it’s important to request that the development team perform code reviews to at least identify worst practices. Also, beware of vendor apps and ORM-generated apps since they are widely-known to provoke serious problems, such as implicit conversion issues, into the code they create. For more information on this, see my session, ”How to Assess the Technical Quality of Apps Running on Microsoft SQL Server.”
- In addition, if you’re apathetic about the Dev team, you’re probably not paying a lot of attention to code deployment or change management. Again, this can be a very problematic issue if you ever get bitten by the need to roll back a significant code deployment. Why? Usually, code deployments can be very difficult to rollback, especially when they run in production for more than a few hours allowing new data to reside in the database. That means you must somehow manage to figure out how to roll back the code WITHOUT rolling back the data. It’s an ordeal, I can assure you. Without it, DBAs face:
- Changes that leave things worse than when they started
- Piecemeal rollbacks that cripple applications
- Inconsistent support across applications and servers
2. Inadequate preventative maintenance—Preventative maintenance is essential for the long-term care and operations of a SQL Server instance and the data therein. Unfortunately, Microsoft doesn’t build in those preventative maintenance tasks. (However, to give them props, they did add a feature a couple of decades ago called “auto-update statistics,” which helps a lot with keeping your index statistics up to date. It’s not perfect, but it definitely helps.) Proper preventative maintenance helps you:
- Catch issues before they become problems
- Ensure optimal performance
- Determine when to perform resource-intensive operations with few, if any, users on the system.
Preventative maintenance on SQL Server should include:
- Database consistency checks (DBCC) and CHECKIDENT
- Backups with verification and Restore checks
- Defragmentation, along with rebuilding Fill factor and possibly using the Pad Index option
- Index Statistics refresh
For more best practices, check out the following resources:
- Preventing SQL Server Performance Problems Before They Hit Production
1. Backups and recovery—These are essential tasks, but unfortunately, DBAs don’t test backup or recovery as often as they should, which can cause many problems, including:
- Not knowing if you can meet your service level agreements (SLAs), Recovery Time Objective (RTO), or Recovery Point Objective (RPO)
- Not certain that backups are good (verified and available)
- Not knowing where all the log files, database files, etc. are for recovery. Also, don’t forget that logins and permissions are stored in the Master database, while SQL Agent jobs for a given database process are stored in the MSDB database.
- Not knowing if you have all the databases that are needed
- No testing for a full, ground-up restore
- Not knowing if you can restore older, archived data
When I presented this webinar, we received a handful of questions from attendees. I’ve captured them below, along with my answers.
Q: How much data would be lost when recovering the corrupted database with data loss?
A: There are two factors at play in recovering a corrupted database. Of course, if backups weren’t set up well in the first place, you could potentially lose LOTS of very important data, maybe even the entire database. But what if you have good backups? You’ve got two options.
First, if you run DBCC CHECKDB with REPAIR_ALL_DATA_LOSS, you will almost certainly lose one or more database pages that have been marked as corrupt. If that’s the case, I recommend that you manually run DBCC CHECKALLOC on each table until you identify all corrupted database pages. Then, you can pipe that data out of the database using something like the Import/Export Wizard or PowerShell. Many times, most of the records on a flagged 8k data pages are fine, and only a few records per page are problematic. That means you can manually recover a lot, if not all, of that data.
Second, how frequent and undamaged are your transaction logs backups and full database backups? It’s possible that with good database backups and, separately, good transaction log backups, that you can go through the process of recovery and lose only data entered up to the point in time of the corruption (RPO). If that’s the case, and your business owners are ok with it, you might simply be able to recover the affected database and logs with, say, an hour’s loss of data entry.
Q: Is it better to use a Microsoft/Gmail login to sign into the application rather than using credentials?
A: I’m not an expert on application authentication by any means. In general terms, I’d suggest that you want to use Microsoft login credentials if the database is internal to your organization. If you’re connecting to a SQL Server database application that is in some way exposed to the internet, then you might want to use a Gmail login. However, there are many security risks when you expose a database to the internet. In a situation like that, you might need to add additional security software like Kerberos or Azure Identity Management. For more information, try referencing this Microsoft documentation.
Q: How do you plan backup and recovery tests for 1500 databases?
A: You’d do this the same way you process any task in SQL Server that requires massive scalability— with scripting. You can use either T-SQL or PowerShell. You don’t describe whether these databases are all on one instance or spread across many. So, let's assume the worst and that these are databases spread across multiple servers. In this situation, my approach would be to create a table with source_server and source_database as well as target_server and target_database. Then, I would write a script to iterate through the list of databases, running each backup from the source to each target, measuring their time and data volume, letting me know whether they succeeded or failed.
I believe that dbatools.io already has a script to do the backup and recovery test. So, all that you’d need to do is build the table to hold the source and target info, and the script to pipe those values into the DBAtool.io POSH script.
Q: What are the biggest issues with transaction log backups?
A: Transaction logs, fortunately, have remained steadfast and unchanged for about three decades. That means almost all the problems are now well understood and most issues have well-documented fixes. In my experience, there are two very common problems.
First, Devs can sometimes build apps without understanding how to write operations (INSERT/UPDATE/DELETE/MERGE) that interact with the transaction log. So, it’s not uncommon to see apps performing huge, multi-row write operations in a single transaction. While that transaction is in flight, the write operation remains open as do all other transactions happening after it. Those transactions coming in after the huge write operation all remain open too, which causes bloated transaction logs that cannot be truncated until the huge transaction completes. This can go on even to the point where the log entirely fills the disk where it lives and crashing SQL Server.
Second—and this goes back to my point about going with the defaults—one of the most common causes of transaction logs filling up is that databases are set to FULL recovery model (or whatever the setting is for the Model database) by default, but that the logs do not have a log backups scheduled by default. When in FULL recovery model, only running a backup on the transaction log with truncate it and keep its size in check. So, your options here are to either run in FULL recovery model with frequent log backups (my default is every 15 minutes) or to run in SIMPLE recovery model.
Remember: only a few big DBA blunders are due to technical skills, or the lack thereof. Most of the mistakes above are due to process and business issues.
Once you apply these best practices, you should find that you and your databases perform at a higher level of efficiency and effectiveness.
Be sure to watch the full Top 10 SQL Server Mistakes Made by DBAs on-demand webinar for more in-depth commentary!
Kevin (@kekline) serves as Principal Program Manager at SentryOne. He is a founder and former president of PASS and the author of popular IT books like SQL in a Nutshell. Kevin is a renowned database expert, software industry veteran, Microsoft SQL Server MVP, and long-time blogger at SentryOne. As a noted leader in the SQL Server community, Kevin blogs about Microsoft Data Platform features and best practices, SQL Server trends, and professional development for data professionals.