Testing Data and Data-Centric Applications Whitepaper


Why data testing is as important as application testing

Data is critical to organizations today. Businesses depend on accurate data to determine whether they are meeting objectives, make decisions about new products and offerings, and evaluate the success of current initiatives. Governments use data to determine which programs are successful and which are not. And non-profits use data to evaluate the impact they're making.

There are countless examples of data being used to support critical processes today. However, most of the energy and effort in testing in IT focuses on testing the application functionality that creates or uses the data, not on verifying the end result—the data itself. Often, data-centric processes—such as data integration, extract, transform, and load (ETL) processes, and analytic applications—are not tested or are only subjected to simple manual testing. On the other hand, application functionality (like application of business rules or implementation of a calculation) are tested extensively, but at an application level only.

Why has there been a historical focus on application testing? For many years, most organizations have concentrated on testing application logic because that’s where the interest was. People were focused on developing new and better applications. They wanted to be able to develop these applications quickly, iterate on them rapidly, and build new ones when the business drivers changed.

This emphasis on applications has required flexible and powerful testing frameworks. After all, it's very difficult to make rapid changes to an application without having a solid set of test cases that can validate that the changes you just made are actually working.

Testing the data in data-centric applications is an often overlooked part of the project.

It was often thought that the application would be the only thing working with the data, so if the application was “correct,” then the data must be correct as well. In practical terms, though, most data today is used and manipulated by multiple systems. Now you have to verify all the applications that might have access to the data, ensure that these applications interact with the data correctly, and confirm that there are no issues with cross-interactions. The problem is even more complex in today’s self-service world because new applications that use your data can be added at any time, often without you being aware.

Another reason that data-centric testing hasn’t been a focus is the perception that testing application logic is “easy,” while testing data is “hard.”


Businesses are becoming more data-driven

Organizations are realizing that the real value is in the data they collect and manage—the applications that work with the data are subject to constant change and replacement. In many cases, the data produced from the applications is more valuable than the application itself. So, while we continue to need to test application logic, we also need to test data. This is particularly true in the following cases:

  • The data is business-critical or a differentiator for the organization

  • The data is interacted with from multiple applications or systems

  • The data is part of a data-centric application or workflow (for example, data integration between systems, ETL, or a data warehouse)


Benefits of data testing 

The major benefit of testing your data and data-centric applications is confidence in your data. One of the more common reasons for business intelligence initiatives to fail is that the users lack confidence in the results. By testing and verifying both the processes and the data that you are using, you can give the consumers of the data the confidence they need to make business decisions.


Challenges of data testing

One of the biggest challenges with testing data-centric applications is that you are interacting with data. To test it well, you need a set of data that addresses the test scenarios. Depending on the goal of the test, you might need a small, static set of data that represents some specific expected data details, or you might need a much larger set of test data that represents your production data.


Tools for data-centric testing

As mentioned in the previous section, the tools available for data-centric testing are, for the most part, lacking in several noticeable ways.

One, most tools are targeted to a particular tool or technology, and don’t provide a way to use the same testing approaches and logic across the different technologies that an organization might use. A certain amount of that is expected, as it is quite difficult to cover every possible data-centric technology available. Often the tools focus on one specific technology. As an example, there are testing tools for Microsoft SQL Server relational databases.


Criteria for choosing a testing tool

As you are looking for tools to drive your data-centric testing initiatives, keep the following criteria in mind:


Automated testing support is critical to any modern testing initiative. You should be able to execute most, if not all, of your tests without requiring any human interaction. This enables you to run tests while you do other things, freeing up resources and time for more critical tasks. It also means that the tests are executed consistently. Manual testing introduces the chance of human error—perhaps a tester forgets to execute a test or a setup step. Automated testing means that you get exactly the same tests executed the same way, every time.

Key roles in data testing

There can be a wide variety of people involved in testing, but data-centric testing process focuses on a few key roles. These roles don't have to be carried out by different people, but each role has a specific focus in the testing process.

Involve these roles in your testing strategy:


Importance of data testing for data-driven businesses

Data-centric testing is a critical factor in today’s data-driven world. The quality, accuracy, and reliability of the data your organization works from is not something that can be left up to chance, or the hope that “nothing will go wrong.” You need to be able to have confidence in your data, and be able to prove that it's accurate and adheres to the organizational requirements for your data.

For more details about applying the principles of data-centric testing, check out the related article Testing Data-Centric Code in Development. This article focuses on how you can adopt data-centric testing as part of your development processes, along with the various types of testing that you can consider as part of developing new and enhanced functionality and data. You'll also see how to apply data verification testing to data throughout your organization, which can increase your confidence in the data you work with every day.

Read "Testing Data-Centric Code in Development"