Data Integration Overview

What is data integration?

Data integration is the process by which data from various sources is combined into a unified view of data. Organizations around the world use such unified views of data for business intelligence, reporting, and analytical purposes.

Data integration primarily consists of three key steps:

Accessing Data

from various heterogeneous sources, which may reside on-premises or in the cloud, using various access protocols.

icon

Integrating the data

and applying transformations such as data mapping, validation, data normalization, data quality checks, and many other steps, depending on the integration style.

Delivering the data

to the data consumer, whether an end user or an application, through various protocols for business reporting and analysis.

 

Primary data integration styles

Data integration has evolved over many years, and three primary styles have emerged: data virtualization; extract, transform, and load (ETL) processes; and enterprise service buses (ESBs). Data virtualization is the latest style in data integration.

Data virtualization is a real-time data integration style that creates a virtual, integrated data layer, which provides an abstraction above the physical data sources and delivers the combined information to consuming applications.

ETL is a bulk data (batch processing) style that moves physical copies of the data to a central repository for the purposes of applying transformation before the data can be consumed by target applications.

ESB is a message-based, near-real-time data integration style in which enterprise applications are integrated through a bus-like architecture.

 
Chrome

When to use which data integration style?

Data virtualization is an excellent choice for a data integration solution when a combination of structured, semi-structured, and unstructured data from legacy as well as modern data sources need to be combined and delivered to end users. A data virtualization solution is also critical when data needs to be accessed and delivered in real-time, as no other data integration style can do that. Data virtualization is well suited for both analytical and operational use cases.

ETL processes work well for bulk copying large data sets, transforming them, and delivering them to the target. It is designed and optimized to handle datasets with millions (or even billions) of rows. ETL processes are best suited for applications that require access to the complete consolidated data set, such as historical trend analysis or data mining operations.

ESB as a data integration style is primarily beneficial when a comparatively lightweight data transfer is needed across a set of enterprise applications comprised of both legacy and modern applications in a service oriented architecture (SOA). ESBs primarily focus on service-enabling business logic, applications, and processes for transactional use.

Data integration in the modern world

In this rapidly changing IT landscape, it is prudent for companies to choose a future proof data integration style.

ETL, as a legacy technology, has been successful in extremely high-volume data integration scenarios for analytical and operational use cases, but it is not efficient, or almost unusable, when it comes to modern, unstructured data sources or real-time data integration needs.

ESBs, on the other hand, are useful for data integration when all applications are SOA ready, when companies want to create an application-agnostic communication layer using message based communication, and when stakeholders want to move away from point-to-point integration. But using ESBs for data services often reduces performance, increases initial and maintenance costs, and reduces the breadth of accessible data sources.

Data virtualization strikes a fine balance using data abstraction, support for a broad range of data sources, including support for SOA architecture. It also enables the application of universal data governance and security, and offers high ROI with lower operational costs. Data virtualization promises not to completely replace ETL, ESB, or MDM systems, but to extend their functionality to help build out companies’ enterprise data architecture 2.0.

Chrome

What's Next?

Gain real-time insights from your data and begin
your digital transformation today!