Spotting and Avoiding Database Drift
Published On: September 8, 2021
Managing any database ecosystem is difficult enough: taking backups, maintaining statistics, and doing performance tuning all tax the time of the DBA or database developer. The job is complex even without considering the work you do to manage the various schema and data drifts that can occur.
Unless you operate in a vacuum or within a single person organization (and even then, schema drift can occur), drift is going to manifest naturally and as the size of the environment expands. The amount of effort required to curtail drift is directly proportional to the size of the environment.
Let’s discuss the various kinds of drifts you might discover in the wild.
Every database has a foundation of objects such as tables, views, functions, or some other components that either contain or manipulate data in some way. This collection of objects is referred to as a database schema.
You can think of the schema like the foundation of a building: providing the best support possible for the data contained within.
If the foundation of the building shifts, it can cause major structural issues elsewhere in the building, some of which might not be noticeable until a later point in time. If you’ve ever noticed the foundation of a structure has shifted, you may have also noticed damage is usually not repaired immediately. It can take days, weeks, or months before a more permanent fix is properly implemented.
You might see temporary shoring put into place to buy some time, and you can do the same with the database foundation. However, the longer the damage exists without remediation, the more difficult it may to be to repair it. By this time, downstream processes could be wrong, and the damage could be monumental. Managing and correcting database drift as early as possible can help ensure your foundation remains stable.
Figure 1—An example of schema drift: The index on the left doesn’t exist on the table on the right.
Just as the schema can drift, so can data contained within the schema. Most frequently, this can happen with reference or static data. However, depending on your environment, data drift can happen with transactional data. The notion of data sync is when data between tables from two different databases or environments are no longer in sync. The data is the business, so if the data drifts, then the business could respond accordingly, and it too can drift. Ensuring data drift doesn’t happen is critical to any organization, especially if you use continuous integration or deployment for your databases.
Data drift can be more subtle than schema drift. Schema drift exists in a binary state, meaning it’s either drifted or it’s not. The subtle nature of data, however, along with the potentially sensitive nature of applications means a single character difference could create a drastic change in the result.
Figure 2—An example of data drift: the RevisionNumber column isn’t the same in both tables.
Another notable quality of data drift is it can more easily occur than schema drift.
Ask yourself, who or what has permissions to alter the structure of an object within a database? Hopefully, few entities have such permissions.
Now ask who or what has permissions to read or write data?
- Those web applications? Sure do.
- Report writers?
- Legacy applications that can’t be updated or modified because they’ll break? Without a doubt.
And that could be a problem.
Finally, remember data security can help in the fight against data drift. As you start to explore the environment looking for drifts, make sure to examine the security policies to ensure only the expected entities are allowed to manipulate data.
Operational drift is a different beast in that it can manifest in a variety of ways. Operational drift can occur when things change; for example, increased storage consumption because the indexes are no longer congruent causes drift.
This isn’t because the foundation has shifted or because the data is out of sync; it’s because the indexes sitting on top of the foundation have drifted.
In SQL Server, you can have one clustered index (which represents the table) and up to 999 non-clustered indexes. When non-clustered indexes are defined, an index key (one or more columns from the table) is defined, and any correlating Include Columns along with it. Include Columns are just pointers back to the data in the clustered index. If additional key columns are added or removed, this would change the size of the index.
Another way to create operational drift is by over-indexing tables for a specific workload where existing indexes could be modified to serve the workload. This scenario is common when developers are attempting to increase the performance of the specific query they’re working on. Instead, they should be looking at indexes at a holistic level to see what can be modified to meet their needs rather than myopically creating new indexes.
Operational drift can easily manifest itself as you move workloads to the public cloud. Unlike the traditional data center environment, it’s common for more people to be able to create resources in your cloud environment than they did on-premises. While this process creates more agility, it can also be a governance and accounting nightmare. If resources aren’t properly tagged or documented, owners and associated projects can’t be easily identified.
Give Drift the Boot
Regardless of the type of drift you find in your environment, management of the drift is crucial. If you don’t get the drift under control, it can quickly evolve into a downward spiral, which can cause significant issues downstream in any development lifecycle. Make sure to evaluate the SolarWinds® DBA xPress free tool and all its built-in components. Each one can help you manage and contain a certain aspect of database drift.
John is a Principal Consultant with Denny Cherry & Associates Consulting holding Microsoft Data Platform MVP and VMware vExpert awards. He specializes in deploying SQL Server related solutions to solve business needs for organizations.