Speakers: Pablo Alvarez, Director of Product Management, Denodo
Data lakes have grown to be a popular architecture that enables modern analytics and data science. However, complete replication of all corporate data into giant data lakes is unfeasible. Data volumes are too high, and replication to multiple systems creates brittle point-to-point connections. Out-of-sync data and uncontrolled replication leads to “data swamp” scenarios. On top of the physical data lake, a logical approach is more feasible: a logical layer that connects different systems (the data lake among them) and exposes them as one. The complexity of the back-end systems is hidden from the end user. Security, governance and auditing are again centralized.
As data volumes grow exponentially, optimization techniques have also evolved to perform in these scenarios. Techniques like complex query rewriting, on-the-fly data movement between sources, and MPP capabilities provide the processing muscle to perform efficiently.
Attend this session to learn: