News & Events
[SQL Death] Apache Pinot Query Optimizer
Apache Pinot is a distributed real-time OLAP database, part of a fast-growing segment designed for large-scale, user-facing analytics. Its primary query language is SQL, and it excels at low-latency queries, high throughput, and fresh data. Currently, Pinot supports two SQL dialects, and we are building a compatibility layer to enable pluggable time-series query languages, with Uber's M3 and PromQL as the first integrations. But these languages are just interfaces. Once parsed and validated, queries are transformed into Pinot's relational algebra, Read More
[SQL Death] Towards Sanity in Query Languages
The relational model has stood the test of time is the foundation of most database systems. But let's be honest -- its success is not because of SQL, but in spite of it. SQL's syntax is arcane, inconsistent, and bears little resemblance to the actual execution semantics of queries. Worse yet, SQL is not even a true standard -- every system implements its own incompatible dialect, creating a fractured ecosystem where portability and interoperability are afterthoughts. This lack of standardization Read More
[SQL Death] Larry Ellison was Right (kinda)! TypeScript Stored Procedures for the Modern Age
No one uses SQL to write business logic. It's written in a programming language with libraries, tests, type safety, and expressive syntax. Traditionally this was the domain of a backend team, who’d try to build enough functionality to keep the frontend team happy without breaking the database. This model hasn’t kept up with the needs of full stack developers though, so they’ve turned to platforms that expose the database directly to code running on client devices. This introduces a host Read More
SplitSQL: Practical Pushdown Cache for DataLake Analytics (Xiangpeng Hao)
Modern data analytics embrace a disaggregated architecture which decouples storage, cache, and compute into network-connected independent components. With disaggregated cache, a key design decision is whether to push down query predicates to the cache server. Without predicate pushdown, the cache must send all data to compute nodes, creating network bottlenecks. With predicate pushdown, the cache server evaluates predicates on cached data, but its limited computational resources become the bottleneck. In this talk, we introduce SplitSQL, a pushdown cache system with Read More
Amazon Redshift: re-innovating cloud analytics
In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools. This cloud service was a significant leap from the traditional on-premise data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Customers embraced Amazon Redshift and it became the fastest growing service in Read More
The Rise of Data Streaming Platforms
Apache Kafka and Apache Flink are powering a new category of data infrastructure called data streaming platform (DSP). This provides an opportunity for each enterprise to take actions on what’s happening in its business in real time. I will first provide an overview of DSP. DSP has both similarities and differences to database systems. I will show how existing database technologies can be used in this new platform and some of the unique problems that DSP needs to solve. I Read More
Evolution of the Storage Engine for Spanner, an Exabyte-scale Database System
I'll describe the design of Spanner's new storage engine, Ressi, which replaced untyped sorted string tables (inherited from Bigtable) with a strongly typed SQL-native representation. Live migration of 6 exabytes of data and multiple billion-user products to the new engine posed unique challenges. Sound methodology from experimental computer science was the key to its success. The simplicity and power of declarative queries combined with strongly consistent transactional semantics has scaled to many thousands of machines running an aggregate of over Read More
Snowflake, and why the Cloud reshaped the analytics industry
Snowflake was the first data warehouse designed from scratch to take advantage of Cloud economics. We'll talk about what that means, why it was such a big deal, and how its design differs from the approaches taken by similar systems. Stay until the end for some bonus content on how Snowflake is bringing stream processing into the DBMS. Zoom link: https://cmu.zoom.us/my/jignesh Read More
AI Vector Search in the Oracle Database
AI Vector Search in Oracle Database is a new, transformative way to intelligently, efficiently, and accurately search business data by using AI techniques to search data by semantics, or meaning. With the inclusion of a new VECTOR data type, new approximate search indexes, and new SQL operators and extensions, enterprise companies can quickly and easily leverage AI Vector Search to build modern, generative AI applications in just a few lines of SQL. And with this simplicity comes power, as AI Read More
Engineering Your Own Path: From University to Universal Impact (Camille Fournier)
SCS Distinguished Alumni / Bruce Nelson Distinguished Lecture Read More