We live in a world of accelerating change, disruption, and technological breakthroughs, which are transforming how we live and work. To be prepared, an organization’s secret weapon is their data.
The problem is, despite many decades of investment in data management, organizations still struggle to deliver data to the people who need it most: front-line workers and their leadership, so they can get the insights they need, when they need it, and act with confidence.
As a result, many organizations face fear, uncertainty, and doubt in their data-related initiatives. They fear that it takes too long to get the data they need, they cannot trust the data, they are not assured of compliance, and their data modernization projects won’t bear fruit.
“More of the same” isn’t working. It is time to rethink data management.
Data management refers to the process of organizing, storing, maintaining and protecting data throughout its lifecycle, and making it accessible to various users in the form and timeliness they need.
Data management consists of two complementary disciplines: Physical data management and logical data management.
- Physical data management focuses on the physical storage and processing of data in data management systems or other storage technologies.
- Logical data management focuses on organizing and managing data based on the meaning and context of its end users, regardless of its physical location.
Both are equally important to ensuring that organizations get the maximum value from their data. However, logical data management is an overlooked, underused discipline, resulting in the data struggles many organizations continue to face today.
Here is a quick overview of these two data management disciplines.
Physical data management is the process of implementing and maintaining the physical storage, access, and processing of data in a database management system or other data storage technology. This includes defining and implementing data models according to the distinct characteristics of those technologies. With physical data management, access is limited to the formats and structures supported by each such technology, and it requires technical skill, and time, to access and deliver data into the form and context that different users need.
Relational, NoSQL, data lakes, and data warehouse management systems: These are all examples of physical data management systems.
Logical data management is a process of connecting to data remotely rather than physically replicating data into a shared repository. It offers a way to organize and manage data based on the meaning and context needed by its end users. It focuses on defining data elements, their relationships, and their attributes in an understandable and meaningful way to all users, including both business and technical users, and, increasingly, artificial intelligence.
The goal of logical data management is to get data into a form that all users can easily understand and maintain, and have it delivered to them on a timely basis. This includes defining data attributes and relationships, as well as defining business rules, and data quality and security standards. The result is data delivered in the language and speed required by its users, decoupled from its underlying physical databases and storage technologies.
Logical data management is critical for ensuring that data is well-organized, understandable, and meaningful for the entire organization. This is essential for effective data analytics, operations, and governance.
Physical data management has been the predominant discipline used for decades, and has included enterprise data warehouses in the 1990s and beyond, big data platforms in the 2010s, and cloud data warehouses and data lakes today. However, when used without a logical data management component, physical data management platforms cause end users to struggle to access the data they need, when they need it, because they require the following activities to be performed before data can be readily accessed and shared with end users:
- Onboarding data from its original data sources, which are often highly distributed across on-premises and cloud environments.
- Continually maintaining the pipelines between data sources and main repositories, ensuring that data is continually kept up-to-date.
- Preparing data so that it aligns with a wide variety of different users’ needs, including aggregations and transformations into forms they can readily consume.
- Publishing the data so that it is easily accessible while also ensuring its quality and security.
These activities often result in additional time and friction before data can be consumed by its end users.
As a result, logical data management is increasing in popularity. Several new data architectures and data management disciplines have emerged in recent years, which embrace logical data management.
- Data fabric, which includes metadata-driven data integration and sharing across multiple underlying physical data sources. Data fabric enables organizations to provide business-users with access to data, regardless of its physical location(s).
- Data mesh, which is an organizational principle that centers around the creation and maintenance of data products, packaged to meet the needs of specific business functions or processes, and are managed by dedicated data domains, independently from the various physical data sources that feed these data products.
- Data marketplaces, which include a central catalog in which end users can discover and provision available data assets in a self-service manner. Data marketplaces also involve setting and measuring SLAs for those assets’ quality, security, and other governance policies.
In all three, data end users do not necessarily see – or even know anything about – the underlying data sources. Instead, data is presented to them in the form and context they need, directly from the single access layer provided by a logical data management platform.
Capability | Logical | Physical |
---|---|---|
Scope | Organizing and classifying data at a higher level, according to the business context of its end users | The storage and processing of data in one or more physical databases or storage technologies |
Focus | The business meaning and inter-relationships of data | The technical aspects of data storage and retrieval |
Abstraction | Data models, structures, and relationships aligned with end-user context, independent of how data is physically maintained | Data models, structures, and relationships that are specific to the various physical database and storage technologies |
Data Independence | Provides data independence from the underlying physical data storage, enabling data to be manipulated and viewed in different ways without affecting the underlying data | A lack of independence, as data is always closely tied to the specific storage technology |
Role | Can be managed by data analysts, engineers, and scientists embedded within business teams | Typically managed by administrators of the various physical databases and storage technologies |
Data virtualization implements logical data architecture and management that integrates all enterprise data siloed across disparate systems, manages unified data for centralized security and governance, and delivers it to business users in real time.
To learn more about data virtualization, visit the link below.