Data lakes have grown to be a popular architecture that enables modern analytics and data science. However, complete replication of all corporate data into giant data lakes is unfeasible. Data volumes are too high, and replication to multiple systems creates brittle point-to-point connections. Out-of-synch data and uncontrolled replication leads to “data swamp” scenarios. On top of the physical data lake, a logical approach is more feasible: a logical layer that connects different systems (the data lake among them) and exposes them as one. The complexity of the back-end systems is hidden from the end user. Security, governance and auditing are again centralized.
As data volumes grow exponentially, optimization techniques have also evolved to perform in these scenarios. Techniques like complex query rewriting, on-the-fly data movement between sources, and MPP capabilities provide the processing muscle to perform efficiently.
Attend this session to learn:
- How a logical data lake can overcome some of the issues of data lakes
- How MPP acceleration and other optimization techniques designed for large data volumes work
- How Denodo customers have implemented logical data lakes to improve the value of their Hadoop investments