back chevron
Back to Events
back chevron
Back to Events

High speed data from the Lakehouse to DataFrames with Apache Arrow

High speed data from the Lakehouse to DataFrames with Apache Arrow
No items found.

High speed data from the Lakehouse to DataFrames with Apache Arrow

calendar icon
December 8, 2023
calendar icon
clock icon
1:00 pm
UTC
clock icon
UTC
clock icon
Online

In this talk, we will explore the importance of Apache Arrow in the evolution of data for DataFrames and examine the huge performance benefits of using Apache Arrow end-to-end in integration with Python clients with data lakes and warehouses.

DataFrames are widely used for data transformations and analysis. DataFrames have historically been created from files in CSV, JSON, or Parquet file formats. As DataFrames have become more widely adopted for data processing pipelines, they have acquired new capabilities for both reading and writing from/to databases and data lakes.

In 2023, with the introduction of Pandas2, Apache Arrow became the dominant standard for both in-memory representation and over-the-wire transfer format for data in DataFrames. Now, data can moved at little cost between DataFrame frameworks such as Pandas2, Polars, and even distributed DataFrames, such as PySpark. Network protocols such as Arrow Flight and ADBC enable data to flow from columnar datastores, like data warehouses, directly to Pandas2 clients without serialization/deserialization or transformations between row-oriented and column-oriented formats (as  required by JDBC/ODBC APIs).

In this talk, we will explore the importance of Apache Arrow in the evolution of data for DataFrames from CSV and Parquet file formats, to tabular formats, such as Apache Hudi, Iceberg, and Delta Lake. We will examine the huge performance benefits of using Apache Arrow end-to-end in integration with Python clients with data lakes and warehouses. We will also look to the future at work we have been doing in the open-source Hopsworks platform, on network-hosted (serverless), incremental tables for DataFrames - backed by Arrow End-to-End.

Sign up here

Register now!

Thank you for registering!
Oops! Something went wrong while submitting the form, please check your details again.

Tags

You might also be interested in:

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions