PhD Defense: On Embedding Database Management System Logic in Operating Systems via Restricted Programming Environments (Matt Butrovich)

Event Date: Friday April 5, 2024
Event Time: 02:00pm EDT
Speaker: Matt Butrovich [INFO]

Title: On Embedding Database Management System Logic In Operating Systems Via Restricted Programming Environments

The rise in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel-space) limit query processing responsiveness. User-space applications can elide these overheads with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible with standard deployment and debugging tools.

This thesis presents an alternative in user-bypass: a design that extends OS behavior for DBMS-specific features, including observability, networking, and query execution. Historically, DBMS developers avoid kernel extensions for safety and security reasons, but recent improvements in OS extensibility present new opportunities. With user-bypass, developers write safe, event-driven programs to push DBMS logic into the kernel and avoid user-space overheads. There are two ways to to invoke user-bypass logic: (1) when a DBMS in user-space invokes these programs, user-bypass provides behavior similar to a new OS system call, albeit without kernel modifications. In contrast, (2) when an OS thread or interrupt triggers these programs in kernel-space, user-bypass inserts DBMS logic into the kernel stack.

First, we present a framework that employs user-bypass to collect training data for self-driving DBMSs efficiently. User-bypass programs reduce the number of round trips to kernel-space to retrieve performance counters and other system metrics. Next, we present a database proxy that applies user-bypass to support features like connection pooling and workload replication while reducing data copying and user-space thread scheduling. User-bypass programs embed DBMS network protocol logic in multiple layers of the OS network stack, applying DBMS proxy logic in a kernel-space fast path. Lastly, we present an embedded DBMS for future user-bypass applications. We discuss the design decisions, environment challenges, and performance characteristics of a DBMS that offers ACID transactions over multi-versioned data in kernel-space. We also explore applications of this user-bypass DBMS and compare them to modern user-space systems.

The techniques proposed in this thesis show user-bypass benefits across multiple DBMS design disciplines and provide a template for future DBMS and OS co-design.

Thesis committee:

  • Andrew Pavlo, Chair
  • Jignesh M. Patel
  • Justine Sherry
  • Samuel Madden, Massachusetts Institute of Technology