A Beginner’s Guide to Building and Maintaining Database Documentation

Joey D’Antoni

Published On: May 21, 2021

Categories: Documentation, Database Mapper 0

Although writing better queries and building the right indexes are important parts of improving database performance, building clear database documentation can also contribute to this goal by helping you understand your database architecture. Painting a clear picture of the structure of your database gives you insight into your data flows and helps you identify redundant data and clarify business processes.

Let’s take a look at a few approaches you can take to documenting your database and your data, depending on the nature of your application.

Self-Documenting With Metadata

If you’re designing a database from scratch, you can build your documentation into the data structures themselves using the extended properties feature of SQL Server and Azure SQL Database, as shown in the following example from the AdventureWorks sample database:

/****** Object: Table [Person].[Address] Script Date: 3/21/2021 10:48:07 AM ******/


CREATE TABLE [Person].[Address](
[AddressID] [int] IDENTITY(1,1) NOT NULL,
[AddressLine1] [nvarchar](60) NOT NULL,
[AddressLine2] [nvarchar](60) NULL,
[City] [nvarchar](30) NOT NULL,
[StateProvinceID] [int] NOT NULL,
[PostalCode] [nvarchar](15) NOT NULL,
[SpatialLocation] [geography] NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] [datetime] NOT NULL,
[AddressID] ASC

ALTER TABLE [Person].[Address] ADD CONSTRAINT [DF_Address_rowguid] DEFAULT (newid()) FOR [rowguid]

ALTER TABLE [Person].[Address] ADD CONSTRAINT [DF_Address_ModifiedDate] DEFAULT (getdate()) FOR [ModifiedDate]

ALTER TABLE [Person].[Address] WITH CHECK ADD CONSTRAINT [FK_Address_StateProvince_StateProvinceID] FOREIGN KEY([StateProvinceID])
REFERENCES [Person].[StateProvince] ([StateProvinceID])

ALTER TABLE [Person].[Address] CHECK CONSTRAINT [FK_Address_StateProvince_StateProvinceID]

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Primary key for Address records.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’AddressID’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’First street address line.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’AddressLine1’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Second street address line.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’AddressLine2’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Name of the city.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’City’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Unique identification number for the state or province. Foreign key to StateProvince table.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’StateProvinceID’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Postal code for the street address.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’PostalCode’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Latitude and longitude of this address.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’SpatialLocation’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’ROWGUIDCOL number uniquely identifying the record. Used to support a merge replication sample.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’rowguid’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Date and time the record was last updated.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’, @level2type=N’COLUMN’,@level2name=N’ModifiedDate’

EXEC sys.sp_addextendedproperty @name=N’MS_Description’, @value=N’Street address information for customers, employees, and vendors.’ , @level0type=N’SCHEMA’,@level0name=N’Person’, @level1type=N’TABLE’,@level1name=N’Address’

Beyond using the extended properties feature in the database engine, you can also use the data classifications feature in SQL Server 2019 or Azure SQL Database to classify the data stored in your tables. Combining these two features allows you to build a definition of your database within your schema that can be easily extracted and stored in the database.

Reverse Engineering

Reverse engineering your database isn’t the best approach for documenting your database. You derive more benefit from building documentation while you’re designing your database schema because you can make thoughtful decisions about the designs of tables and data relationships.

However, there are many scenarios in which you don’t have the opportunity to engage in such thoughtful design. For example, if you’ve acquired a company that had a custom application, and its application and administration teams are no longer available.

The best way to understand the design of such an application is to reverse engineer the structure of the database and any data flows in it (e.g., replication to other databases). This process allows you to have some understanding of the application and the potential to identify performance bottlenecks. In a fairly common business scenario of an acquisition, this gives you insight into the existing business processes of the acquired company, or at least the data flow for an operating model.

Understanding Data Lineage

One common database documentation task is understanding data lineage. Data lineage is the path your data takes from its initial creation, through its transformation across different systems over time, and to its final resting state. Whether you’re using extract, transform, and load (ETL) tools, procedural code, or APIs, there are many different touch points where data can be changed from the original source format.

There are several reasons to build out your data lineage, particularly to understand the business meaning of your data: who defined and designed the calculations in a report, what assumptions were made, and the origin and quality of any external data. This process can lead to better data quality by examining the history of the data and its changes and provides better compliance and auditability because any transformation decisions are clearly documented.

Beyond the Database Engine

Although the focus on the database is important, you need to be aware of several other processes when building data documentation.

At the core of most business intelligence processes are the aforementioned ETL processes. ETL captures data from source transactional systems, transforms it, and loads it into a dimensional model in a data warehouse. Having full documentation of these processes allows you to have better understanding of any transformations in your code. Most importantly, the documentation allows you to change your existing ETL processes more easily by understanding the decisions and business logic that were built into the design.

Beyond ETL processes, many organizations use tools such as SQL Server Analysis Services (SSAS) to perform real-time calculations on their data. SSAS provides several calculation options, including key performance indicators (KPIs), which reveal significant business logic. Documenting these decisions gives you a more complete picture of your reports, which allows them to support higher-quality business decisions and to be more maintainable over time.

Getting Help With Database Documentation

Although the concept of building database documentation is well understood, it can be challenging to build a platform to capture that documentation in a consistent manner. Microsoft provides some functionality with extended properties, but functions around data lineage and reverse engineering are limited.

SentryOne Document provides a robust solution, allowing you to easily document a wide variety of data sources. Download a free SentryOne Document trial to see for yourself.

Joey D’Antoni is an ActualTech media contributor and a principal consultant at Denny Cherry and Associates, Microsoft Data Platform MVP, and VMware vExpert.