Master Thesis Talk: Non-blocking Lazy Schema Changes in Multi-version Database Management Systems
The relational schema of a table in a database management system (DBMS) describes its logical attribute information and constraints. Despite the aim of separation between logical schema and physical data storage, in practice, the schema often dictates how a DBMS organizes data on disk or in memory. This tight coupling is because the database's physical schema must match its logical schema. The problem with this is that applications that incur frequent schema changes (e.g., add a column, change column type)... Read More
Spring 2019: Anil Goel (SAP)
SAP's HANA data management platform was architected from the ground up to leverage modern hardware technologies including large main memories, multi-core parallelism, SIMD architectures and vector processing, and to exploit software-hardware co-innovation. SAP HANA supports novel and existing applications with dramatically faster queries, access to up-to-date business data, and greatly simplified database administration. In this talk, we'll describe some key aspects of the internal design of the HANA system, explaining how HANA achieves orders of magnitude performance improvements. We will... Read More
Spring 2019: Ippokratis Pandis (PhD’07, Amazon)
Amazon Redshift is a fast, fully managed, large-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools. In this talk we are going to dive into Redshift's architecture and talk about how we leverage fleet telemetry in order to prioritize the whole development process and to make Redshift achieve top of the line performance at any data scale and concurrency. For that, we are going to focus on... Read More
Ph.D. Program Acceptance Announcement: Tianyu Li
Tianyu Li is the top prospect for database graduate student applications in the 2019 admissions season (ranked #1 "Database Quarterly", #1 "DB All Stars 2019", #1 "ESPN"). He has been admitted to many of the top database Ph.D. programs: Berkeley, CMU, MIT, Stanford, Columbia, Wisconsin, Washington, Maryland. After long deliberation, Tianyu will be announcing his selection on April 15th @ 4:30pm EST. This event will be live streamed to the public. Live Stream: https://cmudb.io/phd2019 (Available April 15th) Media Relations: Please... Read More
[DB Seminar] Spring 2019 Reading Group: Chenyao Lou
Chenyao will present this paper in this meeting: Title: Noria: dynamic, partially-stateful data-flow for high-performance web applications Authors: Jon Gjengset, Malte Schwarzkopf, Jonathan Behrens, Lara Timbo Araujo, Martin Ek, Eddie Kohler, M. Frans Kaashoek, Robert Morris Read More
[DB Seminar] Spring 2019 Reading Group: Gustavo Angulo
Gus will present the following paper in this seminar: Title: SageDB: A Learned Database System Authors: Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, Vikram Nathan Know Your Enemy Read More
Spring 2019: Natacha Crooks (UT Austin)
Modern applications must collect and store massive amounts of data. Cloud storage offers these applications simplicity: the abstraction of a failure-free, perfectly scalable black-box. While appealing, offloading data to the cloud is not without challenges. Cloud storage systems often favour weaker levels of isolation and consistency. These weaker guarantees introduce behaviours that, without care, can break application logic. Offloading data to an untrusted third party like the cloud also raises questions of security and privacy. This talk summarises my efforts... Read More
Spring 2019: Alex Ratner (Stanford)
One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models learn from. In this talk, I will describe my work on data management systems that let users specify training datasets in higher-level, faster, and more flexible ways, leading to applications that can be built in hours or days, rather than months or years. I will start by describing Snorkel, an open-source system for programmatically labeling training data that has... Read More
[DB Seminar] Spring 2019 Reading Group: Matt Butrovich
Matt will present the following paper in this seminar: Title: Concurrent Prefix Recovery: Performing CPR on a Database Authors: Guna Prasaad, Badrish Chandramouli, Donald Kossmann Read More
Spring 2019: Monte Zweben (Splice Machine)
This talk describes the Splice Machine Data Platform designed to power today’s new class of Operational AI applications that require high scalability and high-availability while simultaneously executing OLTP, OLAP and ML workloads. Splice Machine is a full ANSI SQL database that is ACID compliant, supports secondary indexes, constraints, triggers, and stored procedures. It uses a unique, distributed snapshot isolation algorithm that preserves transactional integrity, and avoids the latency of 2PC methods. The talk will present how the optimizer automatically evaluates... Read More