Design patterns of a business’s data lake are the foundation of future development and operations. Security is necessary to enforce not only compliance, but also to keep the lake from becoming a "data swamp" or a collection of unorganized data within the data lake.
Design
The overall design of a data lake is vital to its success. Coupled with security and the right organization processes, the data lake will provide valuable new insights; ultimately providing more tools to executives for making strategic decisions.
Nonetheless, if a data lake is designed poorly, it can quickly become a mess of raw data with no efficient way for discovery or smooth acquisition, while increasing extracting, transforming and loading (ETL) development time and ultimately limiting success of the data lake.
The primary design effort should be focused on the storage hierarchy design and how data will flow from one directory to the next. Though this is a simple concept, as laid out below, it will set up the entire foundation of how a data lake is leveraged from the data ingestion layer to the data access layer.

Data lake zones
Operational zones
Raw zone: This is the first area in the data lake where data is landed and stored indefinitely in its native raw format. The data will live here until it is operationalized. Operationalism occurs once value has been identified by the business, which does not occur until a measurement has been defined. The purpose of this area is to keep source data in its raw and unaggregated format. An example would capture social media data before knowing how it will be used. Access to this area should be restricted from most of the business. Transformations should not occur when ingesting into the RAW zone but rather the data should be moved from source systems to the raw zone as quickly and efficiently as possible.
Data tagging – both automated and manual – is also included in this zone. Tagging of datasets can be stored within Azure Data Catalog. This will allow business analysts and subject matter experts to understand what data lives not only in Azure Data Lake, but Azure at large.
The folder structure for organizing data is separated by source, dataset and date ingested. Big data tools such as U-SQL allow for utilization of data across multiple folders using virtual columns in a non-iterative manner.

