Archived Events

Archived Events

Sep 29

2025

Sep 29 2025
[Future Data] Apache Hudi: A Database Layer over Cloud Storage for Fast Mutations and Efficient Queries
Speaker:
Vinoth Chandar
System:
Hudi
Video:
YouTube

Data lakes emerged as a way to store vast amounts of data as files and objects on infinitely scalable cloud storage, with processing done on scalable distributed compute engines. However, this architecture lacks many of the capabilities of traditional databases, such as efficient mutations, indexing, and transaction management. Apache Hudi was created as the first "lakehouse" project, to bridge this... Read More

Sep 23

2025

Sep 23 2025
[Fall 2025] On Holistic Database Optimization via Leveraging Similarity Across Actions, Workloads, Configurations, and Scenarios (William Zhang)
Speaker:
William Zhang

Modern database management systems (DBMSs) have evolved to support increasingly sophisticated data-intensive applications, at the cost of substantial complexity to configure them for two reasons. First, DBMSs expose a vast configuration space with trillions of possibilities that encompass system knobs, physical design (e.g., indexes), and query options, amongst others. Second, these applications are constantly evolving with changes in data access... Read More

Sep 22

2025

Sep 22 2025
[Future Data] An Extremely Technical Overview of how the Apache Iceberg™ Planning Implementation Actually Works
Speaker:
Russell Spitzer
System:
Iceberg
Video:
YouTube

What are you trying to tell me? That I can read data fast? No, User. I'm trying to tell you that when you are ready, you won't have to. Everyone's heard about how fast Apache Iceberg and maybe you've even heard a few notes about "predicate pushdown" and "file metrics" but you've been left wanting more. You want to know... Read More

Sep 16

2025

Sep 16 2025
Industry Affiliates Program Visit 2025 – Day 2

The second day of Carnegie Mellon University's Database Industry Affiliate Program (IAP) Visit Day, held in the Gates-Hillman Center, shifts focus to the industry side, featuring a series of informative sessions presented by member companies. These sessions offer companies the opportunity to showcase their latest innovations, products, and challenges in the database space, while also highlighting potential career opportunities for... Read More

Sep 15

2025

Sep 15 2025
Industry Affiliates Program Visit 2025 – Day 1

The first day of Carnegie Mellon University's Database Industry Affiliate Program (IAP) Visit Day takes place in the Gates-Hillman Center and is focused on showcasing cutting-edge research in the field of databases. The day is filled with a series of research talks delivered by faculty and students from the university's database group. These presentations provide an in-depth look at the... Read More

May 12

2025

May 12 2025
DBSP: Incremental Computation on Streams and Its Applications to Databases
Speaker:
Mihai Budiu
System:
Feldera

We describe DBSP, a framework for incremental computation. Incremental computations repeatedly evaluate a function on some input values that are "changing". The goal of an efficient implementation is to "reuse" previously computed results. Ideally, when presented with a new change to the input, an incremental computation should only perform work proportional to the size of the changes of the input,... Read More

Apr 23

2025

Apr 23 2025
Real-world Applications of Gen AI and Databases
Speaker:
Sailesh Krishnamurthy
System:
AlloyDB

In this talk we will explore the transformative potential of integrating databases and generative AI in enterprise applications. As Large Language Models (LLMs) are being rapidly adopted, it's clear that they need to interact with a plethora of other systems in the enterprise and agentic applications have emerged as the primary integration mechanism. Of these integration targets, databases are critical,... Read More

Apr 22

2025

Apr 22 2025
Architecture of Aerospike: Fast, Scalable, Geo-Replicated, Multi Model Database Supporting Strict Serializable Transactions
Speaker:
Srinivasan "Sesh" Seshadri
System:
Aerospike

Aerospike is a fast, scalable database that supports multiple data models such as a key value store with complex objects, graphs and vectors. Aerospike supports synchronous replication within a cluster to guarantee linearizability. It also supports asynchronous replication across clusters for disaster recovery with minimal overhead to normal transaction processing. Finally, Aerospike supports ACID transactions and guarantees strict serializability. In... Read More

Apr 21

2025

Apr 21 2025
[SQL Death] Gel: Replacing* SQL and Improving on the Relational Database Model
Speaker:
Michael Sullivan
System:
Gel
Video:
YouTube

Gel (formerly EdgeDB) is a new database built around an evolution of the relational model that we call "graph-relational". In the graph-relational model, data is represented as strongly typed objects containing set-valued scalar properties and links to other objects. Missing values are represented in the language as empty sets (no NULL!), and have consistent semantics. The query language, EdgeQL, supports... Read More

Apr 14

2025

Apr 14 2025
[SQL Death] MariaDB’s Query Optimizer: A Multi-tool That Does Some Things Differently
Speaker:
Michael Widenius
System:
MariaDB
Video:
YouTube

MariaDB's query optimizer stems from MySQL's original implementation. It didn't come from a textbook, instead, it grew organically to meet the demands of the workloads we were targeting. This talk will discuss some of the uncommon choices we've made in MariaDB's new optimizer and what we've got (and lost) as a result. This talk is part of the SQL or... Read More