Definition
A data lake is a storage repository that holds vast amounts of raw data in its native format, so this data can be structured, unstructured, or semi-structured. This is in contrast to a data warehouse, in which data is structured in a common data model. While this allows the flexibility of ingesting different types of data into the data lake in different formats, this lack of structure can lead to a “data swamp” if not carefully managed. Since data lakes store raw data, analysis is flexible and can occur at a granular level to ad-hoc queries. At its core, a data lake is a data storage and processing repository in which all of the data in an organization can be placed and data flows into it and insights flow out. Cloud computing can be used for data lakes, or they can be stored locally, with Hadoop being a popular platform for local storage.
Hadoop
Azure