You are here

Data Virtualization and ETL (Extract Transform Load)

Data Virtualization and ETL (Extract Transform Load) are often pitted as enemies, with ETL users and vendors often opposing adoption of Data Virtualization. The truth is this – almost 100% of Data Virtualization adopters also have ETL and continue to use both effectively.  Those who have not adopted Data Virtualization are losing out. Extract Transform Load is needed for large bulk movement of data. Data Virtualization delivers information access faster. Together they support the broader goals of business insight, analytics and reporting:

  • Agile BI/Reporting. DV is used for agile, real-time reporting and to bring in new sources quickly. But Data Virtualization also combines the backdrop of historical trend data (persisted using Extract Transform Load).
  • Prototyping of Data Warehouse and Reports – Data Virtualization defers costly commitment to Extract Transform Load process development until the needs are proven. Once it is, Data Virtualization accelerates the dialog between business users and IT to reduce risk of an ETL project.
  • Virtual Data Marts. ETL (Extract Transform Load) has done its job to populate a high-performance Data Warehouse from source systems. More replication may be wasteful and raise data governance problems. Data Virtualization can leverage underlying performance and be quicker and more consistent in exposing custom data views.
  • Logical Data Warehouse. When you have large amounts of data in more than one Data Warehouse, Big Data store or Data Lake, more replication through Extract Transform Load can be prohibitively costly and delay business impact. Data Virtualization delivers the logical Data Warehouse on top, while Extract Transform Load continues to populate the physical data stores below. Data Virtualization also enables data services to enable operational use of data.
  • Extract Transform Load from Canonical Data Virtualization Views. An increasingly common practice is to let ETL do the heavy batch data movement, integration and cleansing process, but against virtualized data sources, including hard to integrate semi and unstructured sources.
  • Data Virtualization can also trigger Extract Transform Load. In hybrid scenarios where real-time queries are combined with partial caching and partial pre-fetch/persistence the Data Virtualization platform can trigger an Extract Transform Load process (In the case of Denodo both a built-in batch scheduler and message-based external ETL process can be triggered when certain data conditions exist).

There are several more scenarios for Extract Transform Load and Data Virtualization to collaborate. Architecturally speaking, when Data Virtualization is below ETL, the virtual data views are source for Extract Transform Load. When Data Virtualization is above, it uses the secondary or derived data sources populated by ETL to create virtual data services. And when used side-by-side they create hybrid integration patterns.

Learn more about Data Virtualization and ETL (Extract Transform Load) with this whitepaper.

Other Data Management tools: