[Vaccination 2021] Star-Tree Index: Space-Time Trade Off in OLAP (Kishore Gopalakrishna)
The need for real-time analytics has proliferated in the modern data landscape. The industry is moving towards providing analytics to end-users via interactive apps instead of traditional dashboards. Whether it’s user-facing analytical applications such as LinkedIn’s “Who Viewed My Profile” or an internal monitoring tool used by Uber’s city ops team to regulate trips in a region, it is imperative that the underlying analytical database is highly performant. For instance, LinkedIn handles 170K queries per second across 70+ user-facing applications.
Apache Pinot powers such use cases and many more that need to run high throughput analytical queries in a low latency manner with well defined SLAs. In this talk, we will focus on how Pinot leverages the Star tree index to accelerate such real-time insights. Star tree allows the user to pre-aggregate values across certain dimensions and yet be able to fine-tune the tradeoff between storage space and query latency. Pinot maintains partial aggregates along with the raw data in the same table. This allows Pinot to achieve high query performance, without sacrificing the query flexibility. Furthermore, a Star tree index can be configured dynamically which saves users a lot of time while onboarding new data sets.
This talk is part of the Vaccination Database Tech Talk Seminar Series.
Kishore Gopalakrishna is a founding engineer at a stealth mode startup. Prior to that, he was the architect at LinkedIn’s analytics infra team. Kishore is passionate about solving hard problems in distributed systems. He has authored various distributed systems such as Apache Helix, a cluster management framework for building distributed systems; Espresso, a distributed document store; Apache Pinot, a real-time distributed OLAP engine; and ThirdEye, a platform for anomaly detection and root cause analysis at LinkedIn.
More Info: https://db.cs.cmu.edu/seminar2021/#db3