MS Thesis Defense: High Performance DBMS Design for Intelligent Query Scheduling (Deepayan Patra)
Decades of research in the field of database management systems (DBMSs) have focused on improving system performance with impressive results. Modern analytical databases take advantage of innovative methods such as vectorization and compilation to improve single query performance, use supporting data structures such as indexes or views to reduce data access requirements, and support the execution of multiple queries in parallel while maintaining necessary isolation guarantees.
We propose a new line of work with workload and architecture-aware scheduling algorithms to optimize system performance beyond the now limited incremental gains beyond the aforementioned approaches. In a modern execution environment with heterogeneous query performance and parallelism characteristics and with datasets predominantly residing in memory, resource allocation and system efficiency become paramount. Our proposed scheduling approaches take advantage of known query characteristics to intelligently order query sub-tasks in our execution environment.
In this work, we discuss modifications to a highly optimized execution engine supporting both vectorization and compilation to support newly proposed scheduling algorithms with minimal overhead. Changes to the execution architecture and in-memory data layout mitigate access pattern and function invocation overheads on the path to support NUMA-aware execution. These improvements enable the performance benefits of more intelligent scheduling approaches, which, when implemented, result in average query latency decreases of over 30%.
- Andy Pavlo (Chair)
- Justine Sherry
More Info: https://www.cs.cmu.edu/calendar/163684971