On Embedding Database Management System Logic in Operating Systems via Restricted Programming Environments (Matt Butrovich)

Event Date: Wednesday August 23, 2023
Event Time: 01:00pm EDT
Location: GHC 4303
Speaker: Matt Butrovich [INFO]

Title: On Embedding Database Management System Logic In Operating Systems Via Restricted Programming Environments

The rise in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel- space) limit query processing responsiveness. To avoid these overheads, user-space applications prioritizing performance over simplicity can elide these software layers with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible with standard deployment and debugging tools. For these reasons, few DBMSs employ a kernel-bypass approach.

This proposal presents user-bypass — an approach to designing DBMS software that complements OS extensibility. With user-bypass, developers write safe, event-driven programs to push DBMS logic into the kernel’s stack and avoid user-space overheads. We demonstrate user-bypass in the context of two different applications for DBMSs. First, we present TScout, a framework for training data collection in self-driving DBMSs. user-bypass accelerates TScout’s metrics collection by not requiring multiple round trips to kernel-space to retrieve performance counters and other resource counters. Then, we present Tigger, a PostgreSQL- compatible DBMS proxy similar to RDS Proxy, PgBouncer, and ProxySQL. Through user-bypass, Tigger supports features like connection pooling, transaction multiplexing, and workload mirroring without user-space interaction.

We propose to extend our preliminary work by building a DBMS that executes queries with user-bypass. We will investigate the opportunities and limitations of placing core DBMS components in kernel-space. First, we will design and evaluate a storage manager that stores database entries in kernel-resident data structures, eliding the need to go to user-space to execute queries. Second, we will investigate concurrency control methods to enforce ACID properties within the constraints of kernel execution (e.g., no waiting). Lastly, we will create a framework for logging and checkpointing the database contents stored in kernel-space. This effort will fulfill the goal of crash recovery and inform the design of further uses for logging, like replication.

More Info: