Events

Events

[Building Blocks] Accelerating Apache Spark workloads with Apache DataFusion Comet (Andy Grove)

Date

Mon Sep 30, 2024

Time

04:30pm EST

Location

ZOOM

Speaker

Andy Grove

Apache Spark is one of the most widely-used distributed data analysis frameworks. However, its JVM-based and row-oriented query execution engine limits Spark’s performance and scalability. In this talk, we will introduce DataFusion Comet, an accelerator for Apache Spark designed to improve the efficiency of Spark queries by translating them into native queries that leverage Apache Arrow and Apache DataFusion. We will explore the core architecture of Comet and explain how Spark plans are translated into native plans and talk about some of the challenges of providing Spark compatibility.

This talk is part of the Database Building Blocks Seminar Series.

Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637)

Bio:
Andy Grove is an Apache Arrow & Apache DataFusion PMC Member and the original creator of Apache DataFusion.