News & Events
[DB Seminar] JSON Relational Duality: Converging the worlds of Objects, Documents, and Relational
The "Object-Relational Impedance Mismatch" has been a multi-decade problem for developers, and past solutions have all had various tradeoffs that have compromised efficiency or consistency. JSON Relational Duality is a breakthrough capability that combines the best aspects of the Document model and the Relational models without the drawbacks of either model. This session will provide an overview and deep dive into the inner workings of JSON Relational Duality. We will also discuss some of the benefits of being able to Read More
Industry Affiliates Program Visit 2024 – Day 2
The second day of Carnegie Mellon University's Database Industry Affiliate Program (IAP) Visit Day, held in the Gates-Hillman Center, shifts focus to the industry side, featuring a series of informative sessions presented by member companies. These sessions offer companies the opportunity to showcase their latest innovations, products, and challenges in the database space, while also highlighting potential career opportunities for students. Attendees, including faculty, students, and other participants, can engage directly with company representatives to learn about real-world applications of Read More
Industry Affiliates Program Visit 2024 – Day 1
The first day of Carnegie Mellon University's Database Industry Affiliate Program (IAP) Visit Day takes place in the Gates-Hillman Center and is focused on showcasing cutting-edge research in the field of databases. The day is filled with a series of research talks delivered by faculty and students from the university's database group. These presentations provide an in-depth look at the latest advancements in database technologies, methodologies, and applications. Attendees, including industry partners, gain valuable insights into innovative projects, ongoing research, Read More
Announcing CMU’s Database Industry Affiliates Program
Pittsburgh, PA – The Carnegie Mellon Database Group is pleased to announce the launch of its new Industry Affiliates Program (IAP), designed to create stronger ties between academia and the tech industry. Through this initiative, industry leaders will collaborate with the group to drive cutting-edge research, contribute to database innovation, and help shape the next generation of database engineers. Members of the IAP have exclusive access to unique student recruitment opportunities, early-stage research, and an annual workshop aimed at solving Read More
[Fall 2024] Advancing Database Performance and Capabilities at Snowflake
This talk presents recent research and development at Snowflake aimed at pushing the boundaries of database performance and functionality. In the first section, we will introduce a series of optimizations designed to accelerate query execution within Snowflake’s platform. We will discuss the technical challenges associated with developing general-purpose optimizations and balancing performance improvements across a wide range of workloads. The second section will explore a novel database constraint we’re developing to enable continuous processing applications. A finalization constraint restricts the Read More
[Fall 2024] Databricks: Introduction to Mosaic AI Vector Search
This tech talk will deep dive into some of the most interesting challenges being solved at Databricks. Read More
LSM Management and Using LSM Immutability for Data Virtualization (Vaibhav Arora)
LSM (Log-Structured Merge) trees are now the bedrock of many storage engines and datastores like RocksDB, HBase, Cassandra etc. They provide the ability to avoid random-writes, and provide immutability. Data is organized in multiple-levels that are exponentially increasing in size. Each data mutation writes a new version of an object, and background processes named merge/compaction continuously remove the unused versions, while moving the data across the layers of the LSM tree and maintain its shape. This talk will describe how Read More
[Building Blocks] Apache OpenDAL: One Layer, All Storage (Xuanwo)
Apache OpenDAL is an Open Data Access Layer that enables seamless interaction with diverse storage services, guided by its mission of "One Layer, All Storage" and core tenets of being open, solid, fast, and extensible to serve various users from infrastructure builders to application developers. In this talk, we will explain OpenDAL in more detail and describe the abstractions it builds. We will discuss how OpenDAL helps developers build database systems. This talk is part of the Database Building Blocks Read More
[Building Blocks] Implement, Integrate and Extend a Query Engine (Ruihang Xia)
GreptimeDB uses Apache DataFusion and many other common building blocks in its implementation. This talk will focus on managing the query aspect of a (time-series) database across various parts. We have extended DataFusion to implemenet PromQL, add grammar candies to SQL, cooperate with external secondary indexes and write domain-specific optimizer rules etc. Each of above is extended in a different stage of query execution. In addition to new features, we'll also discuss using DataFusion and Arrow as frameworks for implementing Read More
[Building Blocks] Biting the Bullet: Rebuilding GlareDB from the Ground Up (Sean Smith)
GlareDB is a database system enabling querying across a variety of data sources, including Snowflake, Postgres, and more. Building on top of DataFusion let us get to an early product very quickly. But not everything is sunshine and roses. In this talk, we'll explore some of the limitations we hit with DataFusion, and how we plan to address those in our upcoming engine Bullet. This talk is part of the Database Building Blocks Seminar Series. Zoom Link: https://cmu.zoom.us/j/95283696582 (Passcode 787637) Read More