Why, When, and How to Validate Your Data
Poor-quality data is a common problem that affects companies of all sizes in all industries. Understanding where to implement data validation tests to solve for it is another challenge. SentryOne Test, which is generally available today, is designed to solve problems at every stage of the data lifecycle.
Validating data starts at the source. When building applications that create and store data, it’s vital to ensure that application changes don’t generate bad data. Because SentryOne Test is built on top of industry-standard frameworks, data validation can now be part of build pipelines.
While we’re using MSTest and NUnit, which are often associated with unit testing, we’re leveraging them for integration testing. During the development lifecycle, there are two primary ways in which data validation comes to the fore:
- Reference data validation
- Output data validation
By validating both areas during development, you can improve the quality of the data in the system. Here at SentryOne, we’ve adopted the practice of using SentryOne Test in the development of our own software. We have automated tests running regularly to perform data validation against our applications as they are being developed, as well as in our testing environments.
Data validation has been part of ETL processes for a long time, but it’s not always smooth sailing. Ensuring data quality typically involves building mechanisms into the ETL flow to parse, validate, and route data as appropriate. These routines place a maintenance overhead not only on the ETL development but also on the reporting and management workloads.
SentryOne Test addresses data validation for ETL processes, letting ETL and BI developers concentrate on what they do best. Breaking the testing out allows for a more agile approach to data validation in ETL scenarios. If the business requirements change regarding data validation, then these changes can be implemented without needing to re-work the ETL routines.
Master Data Management (MDM) is crucial to ensuring your business is complying with data privacy regulations, including the General Data Protection Regulation (GDPR) and Canada’s Anti-Spam Law (CASL). As data pros, we spend a lot of time cleaning and refining data to derive the golden records with Master Data Services (MDS) and Data Quality Services (DQS) solutions. But that golden record is of no use to anyone if it just sits in the MDM solution. Ideally, after you’ve built the master data entity, you can pass it to downstream components. Using SentryOne Test, you can verify that the single version of the truth is used in all consumer systems, giving you confidence that data is reaching business users in the correct state.
One of the key elements of good data stewardship is identifying data owners for each entity. Additionally, it’s a best practice to document this information in a searchable data dictionary.
SentryOne Test lets you write tests and then publish them to the server so that data owners can identify validation issues early. The SentryOne Test web portal lets data owners see the data that has caused tests to fail. They can then fix the data in the source systems and manually re-run the tests. All of this can then become normal business practice, which helps improve data quality.
Do you have confidence in the data your company is using to make business decisions? Outdated, inaccurate data can be a thing of the past with SentryOne Test. Automate data testing and validation from development to production with the secure, cloud-based SentryOne Test.
John Welch is the Chief Technology Officer at SentryOne, where he leads a team in the development of a suite of data and BI products that make monitoring, building, testing, and documenting data solutions faster and more efficient. John has been working with data, business intelligence, and data warehousing technologies since 2001. He was awarded as a Microsoft Most Valued Professional (MVP) 2009 - 2016 due to his commitment to sharing his knowledge with the IT community, and is an SSAS Maestro. John is an experienced speaker, having given presentations at Professional Association for SQL Server (PASS) conferences, the Microsoft Business Intelligence conference, Software Development West (SD West), Software Management Conference (ASM/SM), SQL Bits, and others. He has also contributed to multiple books on SQL Server and business intelligence.