Self-Driving Databases & My Pregnant Wife: The Hard Parts — Tour 2019


Andy lives a database-centric lifestyle. That means that he spends most of his time either thinking about databases, writing about databases, using databases, teaching others about databases, or programming databases. Truly his body is a vessel for which to conduct database research.

One day his beloved wife told him that if Stonebraker can have two kids than he can at least have one. This logic seemed to make sense to him at the time. And now she's pregnant.

The idea of having to be responsible for a dependent is stressing Andy out. As such, he is going on a coast-to-coast speaking tour to discuss the challenges of research on self-driving databases while simultaneously trying to be a responsible life partner.

Self-Driving Databases & My Pregnant Wife: The Hard Parts

Abstract: The current research trend is on developing "learned" components to supplement and replace legacy components in database management systems (DBMSs). Such learned components use machine learning (ML) methods to identify non-trivial trends and correlations in the DBMS's runtime behavior. They then use this information to create execution strategies and data structures that are tailored to the application's access patterns. The hope is that learned components will enable new optimizations that are not possible today because the complexity of managing DBMSs has surpassed the abilities of humans. This could then lead to the ultimate goal of achieving a "self-driving" DBMS that is able to configure, manage, and optimize itself automatically as the database and its workload evolve over time. The bad news is that creating such a fully autonomous DBMS is harder than that. The problem requires both holistic systems engineering and novel ML solutions that cannot be solved with just adding learned components to an existing DBMS.

In this talk, I discuss the pressing unsolved problems in self-driving DBMSs. These include how to support training data collection, fast state changes, succinct state and action representations, and accurate reward observations. I will also present techniques on how to build a new autonomous DBMS or the steps needed to retrofit an existing one to enable automated control.

Andy Pavlo is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. His (unnatural) infatuation with database systems has inadvertently caused him to incur several distinctions, such as the NSF CAREER (2019), a Sloan Fellowship (2018), and the ACM SIGMOD Jim Gray Dissertation Award (2014).


Date Location Public? Time
August 13 CockroachDB
New York, NY
YES 12:00pm
August 14 MongoDB
New York, NY
NO 12:00pm
August 15 Two Sigma
New York, NY
NO 12:00pm
August 19 Snowflake
San Mateo, CA
NO 1:00pm
August 20 Google
Mountain View, CA
NO 12:00pm
August 20 Rockset
San Mateo, CA
YES 6:00pm
August 21 Oracle
Redwood City, CA
NO 10:00am
August 21 Yelp
San Francisco, CA
YES 2:00pm
August 26 AIDB @ VLDB
Los Angeles, CA
YES 9:00am
September 18 Cornell University
Ithaca, NY
YES 12:00pm
October 31 UPMC Magee-Womens Hospital
Pittsburgh, PA

Other Tours