Data lakes are commonly centralised data repositories. Data needed by data scientists is physically copied to a data lake which serves as a one-storage environment. This way, data scientists can access all the data from only one entry point – a one-stop shop to get the right data. However, such an approach is not always feasible for all the data and limits it’s use to solely data scientists, making it a single-purpose system. So, what’s the solution?
A multi-purpose data lake allows a broader and deeper use of the data lake without minimising the potential value for data science and without making it an inflexible environment.
The session will focus on describing the following:
- Disadvantages and limitations that are weakening or even killing the potential benefits of a data lake.
- Why a multi-purpose data lake is essential in building a universal data delivery system.
- How to build a logical multi-purpose data lake using data virtualization.