[SQL Death] Apache Pinot Query Optimizer
- Date:
- Mon Feb 24, 2025 @ 04:30pm EDT
- Date:
- Mon Feb 24, 2025
- Time:
- 04:30pm EDT
- Location:
- https://cmu.zoom.us/j/93441451665?pwd=i9IsGbJAvpBFIfuCjGE3joyNKBA2RL.1Zoom
- Title:
- Apache Pinot Query Optimizer
- Speakers:
- Yash Mayya , Gonzalo Ortiz
- Video:
- YouTube
Talk Info:
Apache Pinot is a distributed real-time OLAP database, part of a fast-growing segment designed for large-scale, user-facing analytics. Its primary query language is SQL, and it excels at low-latency queries, high throughput, and fresh data.
Currently, Pinot supports two SQL dialects, and we are building a compatibility layer to enable pluggable time-series query languages, with Uber’s M3 and PromQL as the first integrations. But these languages are just interfaces. Once parsed and validated, queries are transformed into Pinot’s relational algebra, optimized, and executed by the same query engine—where the real magic happens.
In this talk, we’ll explore what makes Pinot’s query engine fast and dive into some of the interesting optimization techniques we apply. While we’ll briefly cover classical strategies (such as disk layouts and indexing), the focus will be on the unique challenges of distributed real-time OLAP systems. This includes data shuffling strategies for joins and dynamic filtering, both critical for maintaining performance at scale.
This talk is part of the SQL or Death? Seminar Series.
Zoom Link: https://cmu.zoom.us/j/93441451665 (Passcode 261758)
Bio:
Yash Mayya is currently a Software Engineer at StarTree and an Apache Pinot committer working on Pinot's query engine. Prior to this, he worked on the Kafka Connect ecosystem at Confluent and is also a committer on the open-source Apache Kafka project. Yash is an open-source enthusiast with a passion for distributed systems and data infrastructure.
Gonzalo Ortiz is a software engineer with 15 years of experience in database query languages, with a particular focus on query language model transformations. He began his career at ETH Zurich before transitioning to the private sector, where he worked on a proxy that transformed MongoDB queries into Postgres. Then, he spent five years at Devo Inc., where he worked on a proprietary distributed database supporting queries in both SQL and LINQ. Since 2022, he has been working at StarTree Inc., contributing to Apache Pinot with a focus on the multi-stage query engine.