[DB Seminar] Fall 2016: Yingjun Wu
Multi-version concurrency control (MVCC) is currently the most popular scheme used in modern database management systems (DBMSs). Although the protocol was discovered in the late 1970s, it is used in almost every major relational DBMS released in the last decade. Maintaining multiple versions of data potentially increases parallelism without sacrificing serializability. But scaling MVCC schemes in a multi-core, in-memory DBMS is non-trivial: when there are a large number of threads running in parallel, the synchronization overhead can outweigh the benefits of multi-versioning.
To understand how MVCC performs in modern hardware settings, we conduct an extensive study of the algorithm’s four key design decisions: scheduling protocol, version storage, garbage collection, and index management. We implemented state-of-the-art variants of all of these in an in-memory DBMS and evaluated them using transactional and hybrid workloads. Our analysis identifies the fundamental bottlenecks of each design choice.