Data is critical to organizations today. Businesses depend on accurate data to determine whether they are meeting objectives, make decisions about new products and offerings, and evaluate the success of current initiatives. Governments use data to determine which programs are successful and which are not. And non-profits use data to evaluate the impact they're making.
There are countless examples of data being used to support critical processes today. However, most of the energy and effort in testing in IT focuses on testing the application functionality that creates or uses the data, not on verifying the end result—the data itself. Often, data-centric processes—such as data integration, extract, transform, and load (ETL) processes, and analytic
Why has there been a historical focus on application testing? For many years, most organizations have concentrated on testing application logic because that’s where the interest was. People were focused on developing new and better applications. They wanted to be able to develop these applications quickly, iterate on them rapidly, and build new ones when the business drivers changed.
This emphasis on applications has required flexible and powerful testing frameworks. After all, it's very difficult to make rapid changes to an application without having a solid set of test cases that can validate that the changes you just made are actually working.
Testing the data in data-centric applications is an often overlooked part of the project.
It was often thought that the application would be the only thing working with the data, so if the application was “correct,” then the data must be correct as well. In practical terms, though, most data today is used and manipulated by multiple systems. Now you have to verify all the applications that might have access to the data, ensure that these applications interact with the data correctly, and confirm that there are no issues with cross-interactions. The problem is even more complex in today’s self-service world because new applications that use your data can be added at any time, often without you being aware.
Another reason that data-centric testing hasn’t been a focus is the perception that testing application logic is “easy,” while testing data is “hard.”
Organizations are realizing that the real value is in the data they collect and manage—the applications that work with the data are subject to constant change and replacement. In many cases, the data produced from the applications is more valuable than the application itself. So, while we continue to need to test application logic, we also need to test data. This is particularly true in the following cases:
The data is business-critical or a differentiator for the organization
The data is interacted with from multiple applications or systems
The data is part of a data-centric application or workflow (for example, data integration between systems, ETL, or a data warehouse)
The major benefit of testing your data and data-centric applications is confidence in your data. One of the more common reasons for business intelligence initiatives to fail is that the users lack confidence in the results. By testing and verifying both the processes and the data that you are using, you can give the consumers of the data the confidence they need to make business decisions.
One of the biggest challenges with testing data-centric applications is that you are interacting with data. To test it well, you need a set of data that addresses the test scenarios. Depending on the goal of the test, you might need a small, static set of data that represents some specific expected data details, or you might need a much larger set of test data that represents your production data.
As mentioned in the previous section, the tools available for data-centric testing are, for the most part, lacking in several noticeable ways.
One, most tools are targeted to a particular tool or
As you are looking for tools to drive your data-centric testing initiatives, keep the following criteria in mind:
There can be a wide variety of people involved in testing, but data-centric testing process focuses on a few key roles. These roles don't have to be carried out by different people, but each role has a specific focus in the testing process.
Involve these roles in your testing strategy:
Data-centric testing is a critical factor in today’s data-driven world. The quality, accuracy, and reliability of the data your organization works from is not something that can be left up to
For more details about applying the principles of data-centric testing, check out the related article Testing Data-Centric Code in Development. This article focuses on how you can adopt data-centric testing as part of your development processes, along with the various types of testing that you can consider as part of developing new and enhanced functionality and data. You'll also see how to apply data verification testing to data throughout your organization, which can increase your confidence in the data you work with every day.