[Building Blocks] Accelerating Apache Spark workloads with Apache DataFusion Comet (Andy Grove)
- Speaker:
- Andy Grove
- Date:
- Mon Sep 30, 2024 @ 04:30pm EDT
- Date:
- Mon Sep 30, 2024
- Time:
- 04:30pm EDT
- Location:
- https://cmu.zoom.us/j/95283696582?pwd=dn4nharXNC7lu3WCdCXdE2dYWfBB0u.1Zoom
- Title:
- Accelerating Apache Spark workloads with Apache DataFusion Comet
- System:
- DataFusion
- Video:
- YouTube
Talk Info:
Apache Spark is one of the most widely-used distributed data analysis frameworks. However, its JVM-based and row-oriented query execution engine limits Spark’s performance and scalability. In this talk, we will introduce DataFusion Comet, an accelerator for Apache Spark designed to improve the efficiency of Spark queries by translating them into native queries that leverage Apache Arrow and Apache DataFusion. We will explore the core architecture of Comet and explain how Spark plans are translated into native plans and talk about some of the challenges of providing Spark compatibility.
This talk is part of the Database Building Blocks Seminar Series.
Bio:
Andy Grove is an Apache Arrow & Apache DataFusion PMC Member and the original creator of Apache DataFusion.