Self-Driving Database Management Systems: Forecasting, Modeling, and Planning (Lin Ma)

Event Date: Tuesday November 10, 2020
Event Time: 01:00pm EDT
Speaker: Lin Ma [INFO]

Title: Self-Driving Database Management Systems: Forecasting, Modeling, And Planning

Database management systems (DBMSs) are an important part of modern data-driven applications. However, they are notoriously difficult to deploy and administer. There are existing methods that recommend physical design or knob configurations for DBMSs. But most of them require humans to make final decisions and decide when to apply changes. Furthermore, they either (1) only focus on a single aspect of the DBMS, (2) are reactionary to the workload patterns and shifts, (3) require expensive exploratory testing on data copies, or (4) do not provide explanations on their decisions/recommendations. Thus, most DBMSs today still require onerous and costly human administration.

In this proposal, we present the design of self-driving DBMSs that enables automatic system management and removes the administration impediments. Our approach consists of three frameworks: (1) workload forecasting, (2) behavior modeling, and (3) action planning. The workload forecasting framework predicts the query arrival rates under varying database workload patterns using an ensemble of time-series forecasting models. The framework also uses a clustering-based technique for reducing the total number of forecasting models to maintain. Our behavior modeling framework constructs and maintains machine learning models that predict the behavior of self-driving DBMS actions: the framework decomposes a DBMS’ architecture into fine-grained operating units to estimate the system’s behavior under unseen configurations.

We propose to build the last action planning framework for self-driving DBMSs that make explainable decisions based on the forecasted workload and the modeled behavior. We aim to design a receding horizon control strategy that plans actions using Monte Carlo tree search. We will investigate techniques to reduce the action space and improve the search efficiency to ensure that the planning framework generates the action plan in time. Lastly, we will explore feedback mechanisms to incorporate the observation of the applied actions to correct the planning errors.

Lin is a Ph.D. student in the Computer Science Department at Carnegie Mellon University advised by Andy Pavlo. His research interest is in database systems, data management, and machine learning.

More Info: