5 Flavors of Data Virtualization
5 Flavors of Data Virtualization - from "Feature" to "Enterprise Platform"
As data virtualization gains in popularity, some of its features are being included in other products or as an add-on module or feature.
This can be a good thing, particularly if it is included in the cost of the other product.
- However being able to tell the difference between an add-on or built-in data virtualization product and an enterprise data virtualization platform is important for several reasons:
- Breadth of capabilities may be very limited - particularly sources, logical modeling, performance, security and governance.
- Optimized to play an adjunct function to the main product of the vendor - such as prototyping for an ETL / data warehousing or Master Data Management (MDM) project or tool vendor; or provide a semantic layer for a BI tool. Thus the product is defocused from being a true, high-performance enterprise data virtualization layer supporting widely heterogeneous sources, consumers, and solution patterns.
- Vendor lock-in requiring pre-requisite products or add-ons from the same vendor to get the most value out of the data virtualization product.
The following list helps understand Data Virtualization in many forms:
- Data Blending - This is often included as part of a business intelligence (BI) tools semantic universe layer or is a new module offered by a predominantly BI vendor. Data blending is able to combine multiple sources (limited list of structured or big data) to feed the BI tools. Their output may not support operational data.
- Data Services Module - typically these are offered for additional cost by Data Integration Suite (ETL / MDM / Data Quality) or Data Warehouse vendors. The suite is usually very strong in other areas. When it comes to data virtualization, some features shared with the suite such as modeling, transformation, quality functions are very robust, but the data virtualization engine, query optimization, caching, virtual security layers, flexibility of data model for unstructured sources, and overall performance is weak. This is so because the product is designed to prototype ETL or MDM and not to compete with it in production use.
- SQLification Products - This is an emerging offering particularly among Big Data and Hadoop vendors. These products "virtualize" the underlying big data technologies and allow them to be combined with relational data sources and flat files and queried using standard SQL. This can be good for projects focused on that particular big data stack, but not beyond.
- Cloud Data Services - These products are often deployed in the cloud and have pre-packaged integrations to SaaS and cloud applications, cloud databases and few desktop and on-premise tools like Excel. Rather than a true data virtualization product with tiered -views and delegatable query execution, these products expose normalized APIs across cloud sources for easy data exchange in projects of medium volume. Projects involving big data analytics, major enterprise systems, mainframes, large databases, flat files and unstructured data are out of scope.
- Data Virtualization Platform - Built from the ground-up to provide data virtualization capabilities for the enterprise in a many-to-many fashion through a unified "virtual" data layer. Designed for agility and speed in a wide range of use cases, agnostic to sources and consumers, and competes and collaborates with other less efficient middleware.