[Vaccination 2022] Velox: An Open-source Unified Execution Engine (Deepak Majeti)
Data keeps getting bigger, processing keeps getting more and more complex but the hardware does not get faster. We need to reconsider efficiency from the ground up. While these data processing systems handle various workloads (e.g. “batch”, “analytical”, “streaming”, “AI/ML”), they employ common features such as functions, joins, filter-pushdown, sorting, grouping, projections, etc… A shared library that provides optimized implementations of this common functionality and which can consolidate these data processing systems is desired.
The Velox project is being developed to address the aforementioned needs. Velox provides optimized modules that can be composed to build various data processing systems. Some of these modules include a generic typing system, differently encoded data buffers, expression evaluator, function packages, operators, I/O sub-system, network serializers, and resource managers. This talk will cover in-depth these optimized modules and their benefits.
Velox is being adopted by various data processing systems such as Presto, PyTorch, Spark, etc. In this talk, we will discuss an implementation of the Presto worker on top of Velox. Early results on Presto show up to 3X fewer hardware requirements for the same performance.
Being an open-source project enabled Velox to reap the benefits of community development. In this talk, we will cover some of the experiences with the open-source community.
This talk is part of the Vaccination Database (Booster) Tech Talk Seminar Series.
Deepak Majeti is a principal engineer at Ahana, where he is working towards making Presto a turnkey high-performance analytical engine in the cloud. He is also an active contributor to Velox, a next-generation vectorized data processing library. Deepak is a PMC member for the Apache ORC project and a committer for the Parquet project. Deepak is passionate about thinning the line between Big Data and High-Performance computing.