The Rise of Data Streaming Platforms
Apache Kafka and Apache Flink are powering a new category of data infrastructure called data streaming platform (DSP). This provides an opportunity for each enterprise to take actions on what’s happening in its business in real time. I will first provide an overview of DSP. DSP has both similarities and differences to database systems. I will show how existing database technologies can be used in this new platform and some of the unique problems that DSP needs to solve. I... Read More
[Building Blocks] Biting the Bullet: Rebuilding GlareDB from the Ground Up (Sean Smith)
GlareDB is a database system enabling querying across a variety of data sources, including Snowflake, Postgres, and more. Building on top of DataFusion let us get to an early product very quickly. But not everything is sunshine and roses. In this talk, we'll explore some of the limitations we hit with DataFusion, and how we plan to address those in our upcoming engine Bullet. This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More
Engineering Your Own Path: From University to Universal Impact (Camille Fournier)
SCS Distinguished Alumni / Bruce Nelson Distinguished Lecture Read More
Evolution of the Storage Engine for Spanner, an Exabyte-scale Database System
I'll describe the design of Spanner's new storage engine, Ressi, which replaced untyped sorted string tables (inherited from Bigtable) with a strongly typed SQL-native representation. Live migration of 6 exabytes of data and multiple billion-user products to the new engine posed unique challenges. Sound methodology from experimental computer science was the key to its success. The simplicity and power of declarative queries combined with strongly consistent transactional semantics has scaled to many thousands of machines running an aggregate of over... Read More
[Building Blocks] Building InfluxDB 3.0 with the FDAP Stack: Apache Flight, DataFusion, Arrow and Parquet (Paul Dix)
This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More
AI Vector Search in the Oracle Database
AI Vector Search in Oracle Database is a new, transformative way to intelligently, efficiently, and accurately search business data by using AI techniques to search data by semantics, or meaning. With the inclusion of a new VECTOR data type, new approximate search indexes, and new SQL operators and extensions, enterprise companies can quickly and easily leverage AI Vector Search to build modern, generative AI applications in just a few lines of SQL. And with this simplicity comes power, as AI... Read More
Snowflake, and why the Cloud reshaped the analytics industry
Snowflake was the first data warehouse designed from scratch to take advantage of Cloud economics. We'll talk about what that means, why it was such a big deal, and how its design differs from the approaches taken by similar systems. Stay until the end for some bonus content on how Snowflake is bringing stream processing into the DBMS. Zoom link: https://cmu.zoom.us/my/jignesh Read More
[Building Blocks] Towards “Unified” Compute Engines: Opportunities and Challenges (Mehmet Ozan Kabak)
The architecture diagram of a typical data and AI infrastructure setup often features a primary compute engine (e.g., Apache Spark) alongside an array of supplementary tools for observability, AI integration, streaming support, memory management, interactivity, and more. While this modular architecture can be effective, it also introduces challenges around performance bottlenecks, maintenance costs, and integration complexity. In this talk, we will explore whether it is possible to simplify such complex architectures by addressing some of the core engine-level limitations that... Read More
[Building Blocks] Exon: A Built for Purpose Bioinformatics Database (Trent Hauck)
Without having to implement every component of a database engine, it’s now feasible to build databases that can lean into the idiosyncrasies of specific domains to deliver a better user experience. Exon is one such databases. Thanks to DataFusion, Exon can deliver a complete database, but also have capabilities bridge the gap between bioinformatics and database systems. In this talk I’ll discuss some of the features that make Exon specially adapted to biodata and how those features come about due... Read More
[Building Blocks] Accelerating Data and AI with Spice.ai Open-Source Software (Luke Kim)
Spice.ai OSS is an open-source, portable runtime designed to simplify building data and AI applications. It’s built on industry leading technologies like Apache DataFusion, Apache Arrow, DuckDB and SQLite. In this talk, we tell the story of building neurofeedback systems, to operating DuckDB at cloud-scale, to building Spice.ai OSS for the intersection of high-performance data query and ML-inference. We introduce Spice.ai OSS, demo some of its capabilities and use-cases, explore the design principles and architecture of the platform, and go... Read More