What is data integration?Data integration is the process by which data from various sources is combined into a unified view of data. Organizations around the world use such unified views of data for business intelligence, reporting, and analytical purposes.
Data integration primarily consists of three key steps:
from various heterogeneous sources, which may reside on-premises or in the cloud, using various access protocols.
Integrating the data
and applying transformations such as data mapping, validation, data normalization, data quality checks, and many other steps, depending on the integration style.
Delivering the data
to the data consumer, whether an end user or an application, through various protocols for business reporting and analysis.
Primary data integration styles
Data integration has evolved over many years, and three primary styles have emerged: data virtualization; extract, transform, and load (ETL) processes; and enterprise service buses (ESBs). Data virtualization is the latest style in data integration.
Data virtualization is a real-time data integration style that creates a virtual, integrated data layer, which provides an abstraction above the physical data sources and delivers the combined information to consuming applications.
ETL is a bulk data (batch processing) style that moves physical copies of the data to a central repository for the purposes of applying transformation before the data can be consumed by target applications.
ESB is a message-based, near-real-time data integration style in which enterprise applications are integrated through a bus-like architecture.
Through 2020, 35% of enterprises will implement some form of data virtualization as one enterprise production option for data integration.
When to use which data integration style?
Data virtualization is an excellent choice for a data integration solution when a combination of structured, semi-structured, and unstructured data from legacy as well as modern data sources need to be combined and delivered to end users. A data virtualization solution is also critical when data needs to be accessed and delivered in real-time, as no other data integration style can do that. Data virtualization is well suited for both analytical and operational use cases.
ETL processes work well for bulk copying large data sets, transforming them, and delivering them to the target. It is designed and optimized to handle datasets with millions (or even billions) of rows. ETL processes are best suited for applications that require access to the complete consolidated data set, such as historical trend analysis or data mining operations.
ESB as a data integration style is primarily beneficial when a comparatively lightweight data transfer is needed across a set of enterprise applications comprised of both legacy and modern applications in a service oriented architecture (SOA). ESBs primarily focus on service-enabling business logic, applications, and processes for transactional use.
Data integration in the modern world
In this rapidly changing IT landscape, it is prudent for companies to choose a future proof data integration style.
ETL, as a legacy technology, has been successful in extremely high-volume data integration scenarios for analytical and operational use cases, but it is not efficient, or almost unusable, when it comes to modern, unstructured data sources or real-time data integration needs.
ESBs, on the other hand, are useful for data integration when all applications are SOA ready, when companies want to create an application-agnostic communication layer using message based communication, and when stakeholders want to move away from point-to-point integration. But using ESBs for data services often reduces performance, increases initial and maintenance costs, and reduces the breadth of accessible data sources.
Data virtualization strikes a fine balance using data abstraction, support for a broad range of data sources, including support for SOA architecture. It also enables the application of universal data governance and security, and offers high ROI with lower operational costs. Data virtualization promises not to completely replace ETL, ESB, or MDM systems, but to extend their functionality to help build out companies’ enterprise data architecture 2.0.