Back to the Index

Data Lakehouse

What is a Data Lakehouse?

A Data Lakehouse is a modern data architecture that combines the benefits of both data lakes and data warehouses. A Data Lakehouse consists of a file format, such as Apache Hudi, Delta Lake, or Apache Iceberg, and a set of programs for performing transactional updates and database management operations on the files stored in the Lakehouse file format.

Why is a Lakehouse important?

Lakehouses are often used to store the raw data used to compute features. They provide a centralized platform for storing and processing data that is both flexible and cost-effective. Lakehouses also support both batch and real-time processing, making it easier to work with both historical and streaming data.

Example use of a Data Lakehouse

Apache Hudi is supported by AWS, Hopsworks, and Onehouse. Delta Lake is supported by Databricks. Apache Iceberg is supported by Snowflake and AWS. Spark is popular as a data parallel processing engine to perform ETL operations on Data Lakehouses.

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions