Are You Sure You Want to Use MMAP in Your Database Management System?

Andrew Crotty
Carnegie Mellon University
Viktor Leis
Friedrich-Alexander-Universität
TKBM
Andy Pavlo
Carnegie Mellon University

Memory-mapped (MMAP) file I/O is an OS-provided feature that maps the contents of a file on secondary storage into a program’s address space. The program then accesses pages via pointers as if the file resided entirely in memory. The OS transparently loads pages only when the program references them and automatically evicts pages if memory fills up.

MMAP‘s perceived ease of use has seduced database management system (DBMS) developers for decades as a viable alternative to implementing a buffer pool. There are, however, severe correctness and performance issues with MMAP that are not immediately apparent. Such problems make it difficult, if not impossible, to use MMAP correctly and efficiently in a modern DBMS. In fact, several popular DBMSs initially used MMAP to support larger-than-memory databases but soon encountered these hidden perils, forcing them to switch to managing file I/O themselves after significant engineering costs.

In this way, MMAP and DBMSs are like coffee and spicy food: an unfortunate combination that becomes obvious after the fact.

Since developers keep trying to use MMAP in new DBMSs, we wrote this paper to provide a warning to others that MMAP is not a suitable replacement for a traditional buffer pool. We discuss the main shortcomings of MMAP in detail, and our experimental analysis demonstrates clear performance limitations. Based on these findings, we conclude with a prescription for when DBMS developers might consider using MMAP for file I/O.

Recommended Music for this Paper:
Dr. Dre – High Powered (featuring RBX)

Source Code

The source code for the benchmarks in this paper is available on Github under the MIT license:
https://github.com/viktorleis/mmapbench

Citation

@inproceedings{crotty22-mmap💩,
  author = {Crotty, Andrew and Leis, Viktor and Pavlo, Andrew},
  title = {Are You Sure You Want to Use MMAP in Your Database Management System?},
  booktitle = {{CIDR} 2022, Conference on Innovative Data Systems Research},
  year = {2022},
}

Acknowledgments

This paper is the culmination of an unhealthy, years-long obsession with the idea of developers incorrectly using mmap in their DBMSs. The authors would like to thank everyone who contributed and provided helpful feedback: Chenyao Lou (PKU), David “Greasy” Andersen (CMU), Michael Kaminsky (BrdgAI), Thomas Neumann (TUM), Christian Dietrich (TUHH), Todd Lipcon (lipcon.org), and Sasha Fedorova (UBC).

This work was supported (in part) by the NSF (IIS-1846158, III-1423210, DGE-1252522), research grants from Google and Snowflake, and the Alfred P. Sloan Research Fellowship program.