Data Virtualization and Data Quality
Trusted data is high on the list of requirements alongside speed and agility in modern business. Leading Data Virtualization platforms include comparable functions for integration, transformation, data matching and data cleansing, but they differ from traditional batch data integration and data quality tools in that these functions need to be applied in real-time to data in flight. This is not always possible, particularly if the source data is not well managed, and multi-pass data quality and intervention by a data steward is required. So Data Virtualization and data quality tools play a complementary role. Data Virtualization is well suited to quickly bring sources together and address the most common data quality issues in real-time (deterministic or programmatic data quality functions), and also exposing more complex data quality issues at source and flag them for later cleansing by data quality tools. Below are some of the ways in which Data Virtualization and data quality interact to achieve higher data quality with agility:
- Data Virtualization includes many of the matching, transformation, and data quality functions typically found in data quality tools. However, when it needs something more the Data Virtualization platform allows the creation of custom functions and function calls to other data quality, matching, reference data tools.
- Data Virtualization can invoke built-in or external data profiling tools to determine the level of source quality to establish suitability for real-time integration.
- A common use pattern is to use Data Virtualization for real-time integration with one of the sources being a reference data manager to provide the matching rules. This is used for creating operational virtual data marts and virtual registry-style MDM solutions.
- Data Virtualization can flag non-matching or problematic data across multiple sources and these combined views can be accessed by data quality tools for further exploration of source data discrepancies.
- Data lineage and change impact functions are other features of Data Virtualization that provide information to users of real-time data services on what data comes from more versus less trusted sources, what data quality operations were applied in the Data Virtualization layer, etc. so that the overall result can be weighted for trust-worthiness.
- All of the above information such as data lineage, in-flight data cleansing rules, flagged data (usually exported and persisted) and Data Virtualization logs can be easily visualized from external data quality and management tools. This allows data stewards to quickly address the most important data quality issues at source.
Other Data Management tools:
- Data Virtualization and ETL (Extract Transform Load)
- Data Virtualization and ESB
- Data Virtualization and MDM
- Data Virtualization and Data Modeling/Governance Tools