[Building Blocks] Accelerating Apache Spark workloads with Apache DataFusion Comet (Andy Grove)
Date
Time
Location
Speaker
Apache Spark is one of the most widely-used distributed data analysis frameworks. However, its JVM-based and row-oriented query execution engine limits Spark’s performance and scalability. In this talk, we will introduce DataFusion Comet, an accelerator for Apache Spark designed to improve the efficiency of Spark queries by translating them into native queries that leverage Apache Arrow and Apache DataFusion. We will explore the core architecture of Comet and explain how Spark plans are translated into native plans and talk about some of the challenges of providing Spark compatibility.
This talk is part of the Database Building Blocks Seminar Series.
Bio:
Andy Grove is an Apache Arrow & Apache DataFusion PMC Member and the original creator of Apache DataFusion.