News & Events
Spring 2019: Alex Ratner (Stanford)
One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models learn from. In this talk, I will describe my work on data management systems that let users specify training datasets in higher-level, faster, and more flexible ways, leading to applications that can be built in hours or days, rather than months or years. I will start by describing Snorkel, an open-source system for programmatically labeling training data that has Read More
[DB Seminar] Spring 2019 Reading Group: Matt Butrovich
Matt will present the following paper in this seminar: Title: Concurrent Prefix Recovery: Performing CPR on a Database Authors: Guna Prasaad, Badrish Chandramouli, Donald Kossmann Read More
[DB Seminar] Spring 2019 Reading Group: Tianyu Li
Tianyu will present this paper in this meeting: Title: Faster: A Concurrent Key-Value Store with In-Place Updates Authors: Badrish Chandramouli , Guna Prasaad , Donald Kossmann , Justin Levandoski , James Hunter , Mike Barnett Read More
Spring 2019: Natacha Crooks (UT Austin)
Modern applications must collect and store massive amounts of data. Cloud storage offers these applications simplicity: the abstraction of a failure-free, perfectly scalable black-box. While appealing, offloading data to the cloud is not without challenges. Cloud storage systems often favour weaker levels of isolation and consistency. These weaker guarantees introduce behaviours that, without care, can break application logic. Offloading data to an untrusted third party like the cloud also raises questions of security and privacy. This talk summarises my efforts Read More
[DB Seminar] Spring 2019 Reading Group: Lin Ma
Lin will present this work in this meeting: Title: Automatically Indexing Millions of Databases in Microsoft Azure SQL Database Authors: Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, Surajit Chaudhuri Read More
Spring 2019: Monte Zweben (Splice Machine)
This talk describes the Splice Machine Data Platform designed to power today’s new class of Operational AI applications that require high scalability and high-availability while simultaneously executing OLTP, OLAP and ML workloads. Splice Machine is a full ANSI SQL database that is ACID compliant, supports secondary indexes, constraints, triggers, and stored procedures. It uses a unique, distributed snapshot isolation algorithm that preserves transactional integrity, and avoids the latency of 2PC methods. The talk will present how the optimizer automatically evaluates Read More
[DB Seminar] Spring 2019 Reading Group: Prashanth Menon
Prashanth will present the following paper in this meeting: Title: Thriving in the No Man’s Land between Compilers and Databases Authors: Holger Pirk, Jana Giceva, Peter Pietzuch Read More
[DB Seminar] Spring 2019 Reading Group: Dana Van Aken
Dana will present the following paper in this meeting: Title: Automated Performance Management for the Big Data Stack Authors: Anastasios Arvanitis, Shivnath Babu, Eric Chu, Adrian Popescu, Alkis Simitsis, Kevin Wilkinson Read More
[DB Seminar] Fall 2018: Tianyu Li, Matt Butrovich, Sivaprasad Sudhir
Project 1: Storage Engine (Tianyu Li & Matt Butrovich) In this talk, we will discuss the work we've done on terrier's storage engine over the semester. We will cover the implementation of write-ahead logging and our proposed model for recovery, implementation of indexes, and our roadmap for the storage engine next semester. The immediate future direction for the storage work is to support Apache Arrow natively as our storage format to reduce ETL overhead to a data science pipeline, while Read More
[DB Seminar] Fall 2018: Lin Ma
n the last two decades, both researchers and vendors have built advisory tools to assist database administrators (DBAs) in various aspects of system tuning and physical design. Most of this previous work, however, is incomplete because they still require humans to make the final decisions about any changes to the database and are reactionary measures that fix problems after they occur. What is needed for a truly self-driving database management system (DBMS) is a new architecture that is designed for Read More