Why Use a Data Lake?

Posted by Trevor Warren, Data Architect

Nov 24, 2020

featured-image

In the world of big data, there are many good reasons why to use a data lake. After all, where else can you store both raw and organized data until you need it? In a world bombarded by information, a data lake is a great thing to have.

That’s something important to keep in mind as you are developing a data strategy for your organization. As you build this strategy, it’s also important to understand the difference between a data lake and data warehouse. The operative word is structured.

A cloud data warehouse stores only structured data in a standard format. Structured data is easily analyzed and accessed frequently, as opposed to raw data, which must be collected and processed before it can be stored in a warehouse. Data warehouses store only data that has been processed to use for specific purposes.

business documents on office table with smart phone and laptop computerA data lake, on the other hand, accepts data in all formats. Raw, unstructured files, semi-structured ones and structured data. Some data stored in your lake might have a purpose in the future. Other data remains there for storage until a need arises.

A newer concept, data lakes can better address the needs of the big data world thanks to their “store-everything” approach and ability to handle heavy volumes of different data types. This data might not yet have a determined purpose or schema, but it still holds the key to valuable insights and analysis.

Why Should I Use a Data Lake?

When developing a data strategy, it’s worth diving deeper into the data lake concept. While data warehouses are great for many organizations, if you aren’t sure how you’ll be using your organization’s data, you should take a closer look at a data lake, especially if you need big-batch storage and processing.

It’s cost-effective

Business people attending weekly presentationBecause a data lake’s content is unstructured, storage costs are low compared to a data warehouse. Storage in a data warehouse requires time and effort to identify pertinent data, creating a data model and identifying the database structures and programs to use the data for a specific purpose. It’s costly to do this. Raw data is cheaper to store and extract as needed from a data lake.

It’s flexible

Thanks to its “store-everything” approach that keeps information out of silos, a data lake gives users scalable storage that can easily adapt to growing amounts of data of all types. It can accommodate everything from .xml files to multimedia, binary data to logs, to mention only a few kinds of data you can drop into your lake. It even will accommodate high-speed data.

Subscribe for Updates

Trevor Warren, Data Architect

Trevor has nearly a decade of experience in solving problems for complex computer systems and improving processes. Trevor earned a Master of Science in Data Science. He is also a Google Cloud Certified Professional - Cloud Architect and Data Engineer.

Popular posts

AWS 101: What is Amazon S3 and Why Should I Use It?

Kubernetes 101: What are Nodes and Clusters?

Update: How to Pass the AWS Solutions Architect Professional Exam