PhD Defense: Database Gyms: Towards Autonomous Database Tuning (Wan Shen Lim)
- Speaker:
- Wan Shen Lim
- Date:
- Mon Dec 15, 2025 @ 01:00pm EST
- Date:
- Mon Dec 15, 2025
- Time:
- 01:00pm EST
- Location:
- GHC 4405
- Title:
- Database Gyms: Towards Autonomous Database Tuning
Talk Info:
Database management systems (DBMSs) are the foundation of modern data-intensive applications. But as more features are developed to support new workloads, they become increasingly complex and difficult to configure. Thus, researchers have invested decades of effort into autonomous DBMS configuration. Recent advances in machine learning (ML) have produced tools that outperform unassisted experts in real-world deployments. However, these tools are advisory and require human expertise for deployment into database tuning pipelines.
Using these tools involves a multi-step process where a human operator (1) determines an optimization objective, (2) selects a suitable tool, (3) sets up the DBMS, (4) runs a workload to collect telemetry, (5) uses the telemetry to calibrate the tool, and (6) operates the tool to obtain recommendations, which the operator must then review and apply. These ad-hoc pipelines require significant human effort to set up, extend, and deploy. Moreover, interface differences make tools difficult to compose and interchange. Thus, despite the demonstrated ability of database tuning tools to improve performance and lower costs, the expertise required to operate them limits their adoption.
This dissertation presents the database gym, an integrated framework that systematizes and automates the DBMS configuration pipeline. Unlike prior research that focused on improving tool effectiveness with ML, the gym targets deployment and operational challenges by providing reusable, interoperable, and interchangeable components that simplify tool development and integration.
The gym’s design reflects the observation that the bottleneck in database tuning has shifted from developing better algorithms for tools to acquiring the training data needed to operate them. We demonstrate how the gym's architecture accelerates and adapts tool-based database tuning pipelines through the systematic generation and utilization of training data, enabling the augmentation and orchestration of tools with end-to-end knowledge. For example, it reduces step-level overhead by skipping redundant computation during telemetry generation, thus reducing the tuning pipeline's latency. It also eliminates pipeline-level repetition by reusing training data to adapt a tool's calibrated models across new software versions and hardware environments. Such optimizations are enabled by the gym’s holistic control over the entire tuning process.
Zoom: https://cmu.zoom.us/my/capybara
Bio:
Wan is the #1 Ph.D. student in the Carnegie Mellon Database Group. He loves cheese.
More Info: https://csd.cmu.edu/calendar/2025-12-15/doctoral-thesis-oral-defense-wan-shen-lim